home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Tech Arsenal 1
/
Tech Arsenal (Arsenal Computer).ISO
/
tek-02
/
twu1.zip
/
TWU1DOC.TXT
< prev
next >
Wrap
Text File
|
1991-06-06
|
155KB
|
3,677 lines
──────────────────────────────
INSIDE TURBO PASCAL UNIT FILES
Version 6.0 for MS-DOS
Version 1.0 for WINDOWS
──────────────────────────────
by
William L. Peavy
────────────────
June 6, 1991
ABSTRACT
If you want to know what is in a .TPU (unit) file produced
by either Version 1.0 of Turbo Pascal for Windows or by
Version 6.0 of Turbo Pascal from Borland International, then
this paper is for you. It doesn't explain quite everything
since the I don't have access to secret documents or
anything like that and since some of the data in .TPU files
just doesn't have enough auxiliary information to make its
role clear. However, it is possible to learn a great deal
about how Turbo Pascal organizes the information it needs to
refer to, and it is also possible to learn just what kind of
code the compiler produces.
This is the fourth in a series of reports on the subject of
Turbo Pascal Units, the previous reports treating with Turbo
Pascal Versions 5.0 through 6.0. The evolution of these
files in the face of changing requirements has been
fascinating to behold and deciphering their contents has
been challenging to say the least.
The programs supplied with this report have been reorganized
from their 6.0 style and many identifiers have been changed.
There are also a few bug fixes and algorithm changes. Other
changes were dictated by the changes in the utilization of
the TPU file itself by the Windows Compiler.
Since I have a "real" job which requires my full attention,
and since it doesn't involve use of these products in any
direct way, I am usually hard-pressed to find the personal
time to conduct this research. Consequently, I always
refuse to commit to follow-up or even error correction. It
would be irresponsible of me to pretend it could be
otherwise. Even so, this is a revised report which contains
a few error fixes and discusses the newly enhanced program
which incorporates these fixes and sports some enhanced
capabilities.
Contents
1. Introduction 5
1.1 Caveats 5
1.2 Evolution 6
1.3 Treatment 6
2. Gross File Structure 7
2.1 User Units 8
2.2 SYSTEM Unit 8
3. Locators 8
3.1 Local Links 9
3.2 Global Links 9
3.3 Table Offsets 9
3.4 Basic Relationships 10
4. Unit Header 13
4.1 Description 13
4.2 UNIT Size 16
5. Symbol Dictionaries 16
5.1 Organization 16
5.2 Interface Dictionary 17
5.3 Debug Dictionary 17
5.4 Dictionary Elements 17
5.4.1 Hash Tables 18
5.4.1.1 Size 18
5.4.1.2 Scope 19
5.4.1.3 Special Cases 19
5.4.2 NAME ENTRIES 20
5.4.3 NAME Stubs 20
5.4.3.1 Label Declaratives ("O") 20
5.4.3.2 Un-Typed Constants ("P") 21
5.4.3.3 Named Types ("Q") 21
5.4.3.4 Variables, Fields, Typed Cons ("R") 22
5.4.3.5 Subprograms & Methods ("S") 24
5.4.3.6 Turbo Std Procedures ("T") 25
5.4.3.7 Turbo Std Functions ("U") 25
5.4.3.8 Turbo Std "NEW" Routine ("V") 25
5.4.3.9 Turbo Std Port Arrays ("W") 26
5.4.3.10 Turbo Std External Variables ("X") 26
5.4.3.11 Units ("Y") 26
5.4.4 Type Descriptors 27
5.4.4.1 Scope 27
5.4.4.2 Prefix Part 28
5.4.4.3 Suffix Parts 29
5.4.4.3.1 Un-Typed 29
5.4.4.3.2 Structured Types 29
5.4.4.3.2.1 ARRAY Types 30
5.4.4.3.2.2 RECORD Types 30
5.4.4.3.2.3 OBJECT Types 31
5.4.4.3.2.4 FILE (non-TEXT) Types 31
5.4.4.3.2.5 TEXT File Types 32
5.4.4.3.2.6 SET Types 32
5.4.4.3.2.7 POINTER Types 32
5.4.4.3.2.8 STRING Types 32
5.4.4.3.3 Floating-Point Types 32
5.4.4.3.4 Ordinal Types 32
5.4.4.3.4.1 "Integers" 33
- iii -
Contents
5.4.4.3.4.2 BOOLEANs 33
5.4.4.3.4.3 CHARs 33
5.4.4.3.4.4 ENUMERATions 34
5.4.4.3.5 SUBPROGRAM Types 34
6. Maps and Lists 35
6.1 PROC Map 35
6.2 CSeg Map 36
6.3 Typed CONST DSeg Map 37
6.4 Global VAR DSeg Map 37
6.5 DLL LIST 38
6.6 Donor Unit List 38
6.7 Source File List 39
6.8 DEBUG Trace Table 40
7. Code, Data, Fix-Up Info 40
7.1 Object CSegs 41
7.2 CONST DSegs 41
7.3 Fix-Up Data Tables 42
8. Supplied Program 45
8.1 TWU1 45
8.1.1 Unit TWU1EQU 46
8.1.2 Unit TWU1RPT 46
8.1.3 Unit TWU1UAM 46
8.1.4 Unit TWU1UNA 47
8.2 Notes on Program Logic 48
8.2.1 Formatting the Dictionary 48
8.2.2 The Disassembler 49
9. Unit Libraries 52
9.1 Library Structure 52
10. Inferences Drawn from Analyses 53
10.1 Linker Granularity 53
10.2 Floating-Point Emulation 53
10.2.1 Version 6.0 Compiler For MS-DOS 54
10.2.2 Version 1.0 Compiler For WINDOWS 54
11. Application Notes 55
12. Acknowledgements 56
13. References 57
14. INDEX 58
- iv -
Inside TURBO Pascal Unit Files
──────────────────────────────────────────────────────────────────────
1. INTRODUCTION
This document is the outcome of an inquiry conducted into the
structure and content of Borland Turbo Pascal for Windows (Version
1.0) Unit files. This followed naturally from previous inquiries into
the structure of Unit Files for versions 5.0-6.0 of Borland's Turbo
Pascal Compilers. I was further stimulated to undertake this as a
result of a brief conversation I had with the Principal Architect of
Turbo Pascal, Mr. Anders Hejlsberg, in Houston at the HAL-PC meeting
that served as the platform for the formal announcement of Turbo
Pascal for Windows.
1.1 CAVEATS
The material contained herein represents the findings and
interpretations of the author. A great deal of guess-work was
required and no assurances are given as to the accuracy of either the
findings of fact or the inferences contained herein which are the sole
work-product of the author. In particular, only the materials and
information that any normal Borland customer has access to were
available to the author. Further, no Borland source-codes were
available as the Library Routine source is not licensed to the author.
In short, there was nothing irregular about how these findings were
achieved.
The material contained herein is placed in the public domain free of
copyright for use of the general public at its own risk. The author
assumes no liability for any damages arising from the use of this
material by others. If you make use of this information and you get
burned, TOUGH! The author accepts no obligation to correct any such
errors as may exist in the supplied programs or in the findings of
fact or opinion contained herein.
On the other hand, this is not a "complete" work in that a great many
questions remain open, especially as regards fine details. The author
is highly-qualified in neither Intel 80xxx Assembly Language nor in
Windows 3.0 application programming and several open questions might
best be addressed by persons competent in these areas. The author
welcomes the input of interested readers who might be able to "flesh-
out" some of these open questions with "hard" answers so that all
might benefit from their expertise.
──────────────────────────────────────────────────────────────────────
June 6, 1991 Page 5
Inside TURBO Pascal Unit Files
──────────────────────────────────────────────────────────────────────
1.2 EVOLUTION
The Unit first appeared in Turbo Pascal Version 4.0 (for MS-DOS) along
with the ability to create ".EXE" instead of ".COM" files. This
author began delving into these Unit files beginning with Version 5.0
of Turbo Pascal and each new version of the MS-DOS based product has
seen significant changes in both the form and the content of ".TPU"
files.
In contrast, careful study should make it plain that the Unit File
produced by Turbo Pascal for Windows is remarkably similar to that
produced by Turbo Pascal Version 6.0 (for MS-DOS).
In the main, the files produced by the MS-DOS product (TP6) were rich
with apparently useless fields within some of the data structures. In
essence, the Windows product (TPW) has made use of these fields in a
coherent way that makes the Version 6 units appear to be subsets of
the Windows Units as far as format is concerned.
The Windows version development must have been well-advanced when the
DOS version (6.0) hit the streets. In fact, Mr. Anders Hejlsberg did
confirm my speculation that the compiler "engine" used in the Windows
Product is the same as that used in version 6 of the DOS Product.
1.3 TREATMENT
This report treats with BOTH Turbo Pascal for Windows and Turbo Pascal
Version 6.0 (for MS-DOS). It views Unit Files for the MS-DOS version
as sub-sets of those for the Windows version from the standpoint of
structure. Because of this, the supplied program is able to process
".TPU" files from either compiler with little or no special handling.
This doesn't mean that Version 6.0 Units can be combined with Windows
Applications! When an application (program) is built by either of the
compilers, ALL units must have been compiled by that same compiler if
for no other reason than that the SYSTEM Unit (for one) is uniquely
tailored to each of these environments.
──────────────────────────────────────────────────────────────────────
June 6, 1991 Page 6
Inside TURBO Pascal Unit Files
──────────────────────────────────────────────────────────────────────
2. GROSS FILE STRUCTURE
A Turbo Pascal Unit file consists of an array of bytes that is some
exact multiple of sixteen (16). "Signature" information allows the
compiler to verify that the .TPU file was compiled with the correct
compiler version and to verify that the file is of the correct size.
The fine structure of the file will be addressed in later sections at
ever increasing levels of detail.
Graphically, the file may be regarded as having the following general
layout (major sections bounded by ═ )
╔═══════════════════╗
║ Unit Header ║ Main Index to Unit File
╟───────────────────╢
║ Dictionaries: ║
║ a) Interface ║
║ b) Debug * ║ For Local Symbol Access
╟───────────────────╢
║ PROC Map ║
╟───────────────────╢
║ CSeg Map * ║ May be Empty
╟───────────────────╢
║ CONST DSeg Map * ║ May be Empty
╟───────────────────╢
║ VAR DSeg Map * ║ May be Empty
╟───────────────────╢
║ DLL List * ║ May be Empty
╟───────────────────╢
║ Donor Units * ║ May be Empty
╟───────────────────╢
║ Source Files ║
╟───────────────────╢
║ Trace Table * ║ May be Empty
╠═══════════════════╣
║ CODE Group * ║ May be Empty
╠═══════════════════╣
║ DATA Group * ║ May be Empty
╠═══════════════════╣
║ Code Fix-Ups * ║ May be Empty
╠═══════════════════╣
║ Data Fix-Ups * ║ May be Empty
╚═══════════════════╝
Each of the sections outlined by double lines is capable of being up
to 64K bytes long. The Dictionary Area begins with the Unit Header
and continues through the Trace Table.
──────────────────────────────────────────────────────────────────────
June 6, 1991 Page 7
Inside TURBO Pascal Unit Files
──────────────────────────────────────────────────────────────────────
2.1 USER UNITS
Units compiled by ordinary users have a very straight-forward
appearance and content. The SYSTEM.TPU file is quite another thing
however.
2.2 SYSTEM UNIT
The SYSTEM.TPU file (found in TURBO.TPL and in the TPW.TPL file) is
unique in several respects. It contains several types of entries that
just don't seem to be achievable by ordinary users, and the
arrangement of the entries in the dictionary is unique. Normally, the
Name Entry for the Unit immediately follows the hash table but, in the
"SYSTEM" unit, this is not true. Rather, the hash table is followed
by all the descriptors for the built-in types, followed by descriptors
for the standard procedures and functions, followed by the Name Entry
for the Unit, followed by the conventional dictionary entries
achievable by normal PASCAL coding such as the Typed Constants and
Variables defined in the "SYSTEM" unit.
Try to compile a Unit named "SYSTEM" and you find that the compiler
wants a file called "SYSTEM.TPS". I suspect that "SYSTEM.TPS" is a
file that contains a pre-initialized interface hash table plus the
descriptors for the standard types and the descriptors for the built-
in procedures and functions stored in the "SYSTEM" Unit (which would
otherwise require special syntax to define).
The compiler can't operate normally without a "SYSTEM" unit so this
file probably provides a "bootstrap" mechanism for the built-in
descriptors needed to build "SYSTEM.TPU".
3. LOCATORS
The data in these files has need of structure and organization to
support efficient access by the various programs such as the compiler,
the linker and the debugger. This organization is built on a solid
foundation of locators employed in the unit's data structures.
──────────────────────────────────────────────────────────────────────
June 6, 1991 Page 8
Inside TURBO Pascal Unit Files
──────────────────────────────────────────────────────────────────────
3.1 LOCAL LINKS
Local Links (LL's) are items of type WORD (2 bytes) which contain an
offset which is relative to the origin of the Dictionary Area of the
unit. This implies that the Dictionary Area must be somewhat less
than 64K bytes in size. If the Dictionary Area is loaded into the
heap, then an LL can be used to locate any byte in the Dictionary
Area. (See Below)
Type LL = Word; { Local Scope Locators }
3.2 GLOBAL LINKS
Global Links (LG's) are used to locate type descriptors and to locate
allocation data for variables with the ABSOLUTE attribute which may
reside in other Units (i.e., units external to the present unit).
LG's are structured items consisting of two (2) words (see below).
LG = RECORD
UntLL: LL; { To item in Unit Named by LL below }
UntId: LL; { Stub Type "Y" Name Entry in our Unit }
END;
The first of these is an LL that is relative to the origin of the
Dictionary Area of the (possibly external) unit. It locates either a
Type Descriptor or the stub of the Name entry which establishes
storage allocation. The second word is an LL which locates the stub
of the Name entry in the current unit dictionary for the (possibly
external) target unit. The Name entry for this stub identifies name
of the unit that contains the item the LG points to.
This provides a handy mechanism for locating type descriptors and
allocation information which may be defined in other separately
compiled units.
3.3 TABLE OFFSETS
Finally, various data-structures within a .TPU file are organized as
arrays of fixed-length records or as lists of variable-length records.
Efficient access to such records is achieved by means of offsets
rather than subscripts (an addressing technique denied Pascal). These
offsets are relative to the origin of the array or list being
referenced.
──────────────────────────────────────────────────────────────────────
June 6, 1991 Page 9
Inside TURBO Pascal Unit Files
──────────────────────────────────────────────────────────────────────
3.4 BASIC RELATIONSHIPS
╔═══> ┌────────────────┐ ┌──────────────────────┐
║ ┌────────<┤ Unit Header │ │ Symbol Dictionary │
║ D │ └────────────────┘ │ (names, types etc) │
║ I │ LL ┌────────────────┐ LL's │ defined in INTERFACE │
║ C ├────────>┤ INTERFACE Hash ├───────>┤ │
║ T │ └────────────────┘ └──────────┬───────────┘
║ I │ LL ┌────────────────┐ LL's ┌──────────┴───────────┐
║ O ├────────>│ DEBUG Hash ├───────>┤ DEBUG Dictionary │
║ N │ └────────────────┘ │ Local Symbol option │
║ A │ LL ┌────────────────┐ │ builds this. Holds │
║ R ├────────>┤ PROC Map Table │ │ names and types etc │
║ Y │ └────────────────┘ │ from IMPLEMENTATION │
║ │ LL ┌────────────────┐ │ Linked to INTERFACE │
║ A ├────────>┤ CSeg Map Table │? │ part by LL's. │
║ R │ └────────────────┘ │ │
║ E │ LL ┌────────────────┐ └──────────────────────┘
║ A ├────────>┤ DSeg Map CONST │?
║ │ └────────────────┘
║ │ LL ┌────────────────┐
║ ├────────>┤ DSeg Map VAR's │?
║ │ └────────────────┘
║ │ LL ┌────────────────┐
║ ├────────>│ DLL List │?
║ │ └────────────────┘ IMPORTANT NOTES
║ │ LL ┌────────────────┐ ──────────────────────
║ ├────────>┤ Donor Unit List│? Some of the structures
║ │ └────────────────┘ shown in this figure
║ │ LL ┌──────────────────┐ are built only if they
║ ├────────>┤ Source File List │ are needed. These are
║ │ └──────────────────┘ marked by a "?" next
║ │ LL ┌──────────────────┐ to the box.
║ ├────────>┤ Debug Step Ctls │?
╚═══> │ └──────────────────┘ If the DEBUG Dictionary
│ ** ┌───────────────┐ is missing, its LL
├────────>┤ CODE Segments │? leads directly to the
│ └───────────────┘ INTERFACE Dictionary.
│ ** ┌─────────────────┐ ──────────────────────
├────────>┤ CONST DATA Segs │?
│ └─────────────────┘
│ ** ┌────────────────┐
├────────>┤ CODE Fix-Ups │?
│ └────────────────┘
│ ** ┌────────────────┐
└────────>┤ CONST Fix-Ups │?
└────────────────┘
This figure illustrates the role of the Unit Header in tying together
the various data structures in the Unit. The type of link is shown
next to a flow-line by "LL", "LG" or "**". "LL" and "LG" are explicit
pointers while "**" shows a locator whose value is computed using
other data in the Unit Header and that no explicit pointer exists.
──────────────────────────────────────────────────────────────────────
June 6, 1991 Page 10
Inside TURBO Pascal Unit Files
──────────────────────────────────────────────────────────────────────
┌────(from hash tables,other Name Entries)
│
│ ┌─────────────┬──────────────────────────────────┐
│ │ Header Part │ Stub Part -- many formats │
└───>┤ - - - - - - │ - - - ┌───────────────────────── │
│ │ data, │ Some stubs have embedded │ Name
│ Name, Class │ links │ Type Descriptors │ Entry
│ and link to │ (see │ ┌─────────────────── │
│ prior entry │ below)│ │ INLINE Declarative │
│ having same │ * │ │ code bytes for a │
│ hash-if any │ │ │ │ "macro" type PROC │
└─────────────┴───│──────────────────────────────┘
┌──────────┘
│
│ FAR pntr ┌────────────────────────────┐
├───────────>┤ Absolute Memory Locations │
│ └────────────────────────────┘
│ ┌─────────────────────────────┐
│ LG's │ Type Descriptors and stubs │
├───────────>┤ of Dictionary Entries used │
│ │ for absolute equivalences │
│ └─────────────────────────────┘
│ ┌─────────────────────────────────┐
│ LL's │ Nested Scope Hash Tables │
├───────────>┤ Parent Scope Dictionary Entries │
│ │ Record Fields │
│ │ Object Fields/Methods │
│ └─────────────────────────────────┘
│ ┌──────────────────────┐
│ Offsets │ CONST DSeg Map Table │
└───────────>┤ PROC Map Table │
│ VAR DSeg Map Table │
└──────────────────────┘
This figure illustrates the many types of entities that associate with
Name Entries and particularly with their Stub Parts. Not all of the
links shown occur in a single Stub format, but all of the links in the
figure can and do exist in selected cases. The purpose here is to
show the flexibility of the system of links in associating required
data with the Name Entry and its identifying symbol.
While it may not be apparent from the figure, the dictionary structure
as a whole may be viewed as a cyclic directed graph which is rooted in
the DEBUG Hash Table. The recursive properties exhibited by the node
relationships permit direct support of the scope rules of Turbo Pascal
with simplicity and elegance. As one might expect, the representation
of the required information lends itself to efficient use of storage
since the representations are compact and there is very little in the
way of redundancy. The small amount of redundancy that does exist is
apparently aimed at speeding access to certain structures by the Turbo
components (compiler, linker and debugger).
──────────────────────────────────────────────────────────────────────
June 6, 1991 Page 11
Inside TURBO Pascal Unit Files
──────────────────────────────────────────────────────────────────────
┌────(implied links, explicit LG's from other structures)
│
│ ┌─────────────────────────────────────────────┐
│ │ Flags and codes, allocation widths for data │ Type
└───>┤ and VMT's, subrange constraints, formal │ Descriptor
│ parameter descriptors, implicit associated │ Contents &
│ type descriptors, LL's, LG's and Offsets. │ Linkages
└──────┬──────────────────────────────────────┘
│
│
│ LG's ┌──────────────────┐
├──────────────>┤ Type Descriptors │
│ └──────────────────┘
│
│ ┌───────────────────────────────┐
│ LL's │ Method Name Entries │
├──────────────>┤ Nested Scope Hash Tables │
│ │ Nested Scope Field Chains │
│ │ Parent Scope Name Entry │
│ └───────────────────────────────┘
│
│ Offsets ┌──────────────────────────────────┐
└──────────────>┤ VMT pointers in Object Instances │
│ CONST DSeg Map Table Entries │
└──────────────────────────────────┘
This figure illustrates the relationships between Type Descriptors and
other structures in the dictionary. Not all the links shown can exist
with a single Type Descriptor since there are several variant forms of
these descriptors (depending on base type) but in combination, these
linkages are feasible. In addition to links, a great amount of data
is stored which is peculiar to a given type declaration. Descriptors
can be -- and are -- shared. Indeed, they were designed with that in
mind. Once a NAMED type is declared, all entities that reference it
are linked to it in some way (usually by an LG).
Almost every form of type descriptor is found in the SYSTEM unit and
this fact is used to advantage. When un-typed constants are declared,
a built-in type descriptor is referenced (via an LG) which provides
necessary information for maintenance of orderly dictionary structure.
When a named-type is declared, it is almost always decomposed into an
expression based on the built-in types of Turbo Pascal which are found
in the SYSTEM unit with the aid of an LG.
The semantics underlying the idea of the Unit mandate this very
approach since program modules of any class which make references to
units for definitions use the definitions as implemented by the unit
which contains them. Re-defining the unit or any of its defined types
leads to a natural requirement to re-compile those program modules
which rely on the unit for definitions. The impact is fundamental
since the storage representation of a unit-defined named type can
change in quite radical ways.
──────────────────────────────────────────────────────────────────────
June 6, 1991 Page 12
Inside TURBO Pascal Unit Files
──────────────────────────────────────────────────────────────────────
4. UNIT HEADER
The Unit Header comprises the first 64 bytes of the .TPU file. It
contains LL's that effectively locate all other sections of the .TPU
file plus statistics that enable a little cross-checking to be
performed. Some parts of the Unit Header appear to be reserved for
future use since no unit examined by this author has ever contained
non-zero data in these apparently reserved fields.
4.1 DESCRIPTION
The Unit Header provides a high-level locator table whereby each major
structure in the unit file can be addressed. The following provides a
Pascal-like explanation of the layout of the header followed by
further narrative discussion of the contents of the individual fields
in the Unit Header.
Type HdrAry = Array[0..3] of Char;
UnitHeader = Record
UHEYE : HdrAry; { +00 : = 'TPU9' }
UHxxx : HdrAry; { +04 : = $00000000 }
UHUDH : LL; { +08 : to Name Entry for This Unit }
UGIHT : LL; { +0A : to Hash Table (INTERFACE) }
UHPMT : LL; { +0C : to PROC Map }
UHCMT : LL; { +0E : to CSeg Map }
UHTMT : LL; { +10 : to DSeg Map-Typed CONST's }
UHDMT : LL; { +12 : to DSeg Map-GLOBAL Variables }
UHDLL : LL; { +14 : to DLL List (Windows Only) }
UHLDU : LL; { +16 : to Donor Unit List }
UHLSF : LL; { +18 : to Source file List }
UHDBT : LL; { +1A : to Debug Trace Step Controls }
UHENC : LL; { +1C : Size of Dictionary Area }
UHZCS : Word; { +1E : Size of CODE Group }
UHZDT : Word; { +20 : Size of Typed CONST Group }
UHZFA : Word; { +22 : Fix-Up Bytes (CODE Group) }
UHZFT : Word; { +24 : Fix-Up Bytes (Typed CONST's) }
UHZFV : Word; { +26 : Size of GLOBAL VAR Data }
UHDHT : LL; { +28 : to Hash Table (DEBUG) }
UHSOV : Word; { +2A : Flags - Mostly Unknown }
UHPad : Array[0..9]
of Word; { +2C : Reserved for Future Expansion }
End; { UnitHeader }
UHEYE contains the characters "TPU9" in that order. This is
clear evidence that this unit was compiled by Turbo Pascal
Version 6.0 or by Turbo Pascal for Windows Version 1.0.
UHxxx is apparently reserved and contains binary zeros.
──────────────────────────────────────────────────────────────────────
June 6, 1991 Page 13
Inside TURBO Pascal Unit Files
──────────────────────────────────────────────────────────────────────
UHUDH contains an LL (WORD) which points to the Name Entry in
which the name of this unit is found.
UHIHT contains an LL (WORD) which points to a Hash table that is
the root of the Interface Dictionary graph.
UHPMT contains an LL (WORD) which points to the PROC Map for
this unit. The PROC Map contains an entry for each
Procedure or Function declared in the unit (except for
INLINE types), plus an entry for the Unit Initialization
section. The length of the PROC Map (in bytes) is
determined by subtracting this UHPMT from UHCMT.
UHCMT contains an LL (WORD) which points to the CSeg (CODE
Group) Map for this unit. The CSeg Map contains an entry
for each CODE Segment produced by the compiler plus an
entry for each of the CODE Segments included via the {$L
filename.OBJ} compiler directive. The length of this Map
(in bytes) is obtained by subtracting UNCMT from UHTMT.
The result may be zero in which case the CSeg Map is
empty.
UHTMT contains an LL (WORD) which points to the DSeg (DATA
Segment) Map that maps the initializing data for Typed
CONST items plus templates for VMT's (Virtual Method
Tables) and DMT's (Windows Dynamic Method Tables) that are
associated with OBJECTS which employ Virtual Methods. The
length of this Map (in bytes) is obtained by subtracting
UHTMT from UHDMT. The result may be zero in which case
this DSeg Map is empty.
UHDMT contains an LL (WORD) which points to the DSeg (DATA
Segment) Map that contains the specifications for DSeg
storage required by VARiables whose scope is GLOBAL. The
length of this Map (in bytes) is obtained by subtracting
UHDMT from UHDLL. The result may be zero in which case
this DSeg Map is empty.
UHDLL contains an LL (WORD) which points to the DLL list in
Windows. In Version 6.0, this is always zero.
UHLDU contains an LL (WORD) which points to a table of units
which contribute either CODE or DATA Segments to the .EXE
file for a program using this Unit. This is called the
"Donor Unit Table". The length of this table (in bytes)
is obtained by subtracting UHLDU from the word UHLSF. The
result may be zero in which case this table is empty.
UHLSF contains an LL (WORD) which points to a list of "source"
files. These are the files used as sources during
compilation. Examples are the Pascal Source for the Unit
itself, plus the .OBJ files linked via the {$L
filename.OBJ} compiler directive. The length of this list
(in bytes) is obtained by subtracting UHLSF from the word
UHDBT. There should be at least one entry in this list.
──────────────────────────────────────────────────────────────────────
June 6, 1991 Page 14
Inside TURBO Pascal Unit Files
──────────────────────────────────────────────────────────────────────
UHDBT contains an LL (WORD) which points to a Trace Table used
by the DEBUGGER for "stepping" through a Function or
Procedure contained in this Unit. The length of this
table (in bytes) is obtained by subtracting UHDBT from the
word UHENC. The result may be zero in which case this
table is empty.
UHZDA is a WORD that contains the total byte count of the
Dictionary Area for this unit. All bytes up to and
including the Trace Table are included in this count.
UHZCS is a WORD that contains the total byte count of all CODE
Segments compiled into this Unit.
UHZDT is a WORD that contains the total byte count of all Typed
CONST, DMT and VMT DATA Segments compiled into this unit.
UHZFA is a WORD that contains the total byte count of the Fix-Up
Data Table for this unit for CODE (CSegs).
UHZFT is a WORD that contains the total byte count of the Fix-Up
Data Table for Typed CONST's. This usually implies that a
VMT or DMT is getting its pointers relocated.
UHZFV is a WORD that contains the total byte count of all GLOBAL
VAR DATA Segments compiled into this unit.
UHDHT contains an LL (WORD) which points to a Hash Table which
is the root of the DEBUGGER Dictionary. If Local Symbols
were generated by the compiler (directive {$L+}) then ALL
symbols declared in the unit can be accessed from this
Hash Table. If Local Symbols were suppressed there is no
such Dictionary and the LL stored here points to the
INTERFACE Dictionary.
UHSOV This word contains flags. I have only been able to expose
a few of the values with any real confidence. Here's what
I know so far (expressed by bit numbers 15..0):
15..13: always zero?
12: always zero for Version 6.0 (DOS) Compiler?
1=DISCARDABLE, 0=PERMANENT Windows Segment.
11..7: always zero?
6: always zero for Version 6.0 (DOS) Compiler?
1=PRELOAD, 0=DEMANDLOAD Windows Segment.
5: always zero?
4: always zero for Version 6.0 (DOS) Compiler?
1=MOVEABLE, 0=FIXED Windows Segment.
3: always zero?
2: 0=DOS Compiler, 1=WINDOWS Compiler?
1: 1=DOS Compiler with {$O+}, else zero?
0: Unclear. Seems to imply that either this unit,
or one that it references requires emulation
support but this is only a guess.
──────────────────────────────────────────────────────────────────────
June 6, 1991 Page 15
Inside TURBO Pascal Unit Files
──────────────────────────────────────────────────────────────────────
UHPad begins a series of ten (10) words that are apparently
reserved for future use. Nothing but zeros have ever been
seen here by this author.
4.2 UNIT SIZE
An independent check on the size of the .TPU file is available using
information contained in the Unit Header. This is also important for
.TPL (Unit Library) organization. To compute the file :size, refer to
the five (5) words -- UHZDA, UHZCS, UHZDT, UHZFA, and UHZFT. Round
the contents of each of these words to the lowest multiple of 16 that
is greater than or equal to the content of that word. Then form the
sum of the rounded words. This is the .TPU file size in bytes -- a
LongInt result.
A Unit MAY be larger than 64K bytes. I finally tumbled to this when I
began to analyze the Windows Unit "WOBJECTS". I now feel that each of
the sections referenced by the sizes above may be up to 64K bytes
long. This implies an upper limit for unit size of around 320K bytes.
My face is actually quite red over this. Since a Unit has always been
capable of producing a 64K Code Segment not to mention a Data Segment
of nearly the same size, I can't explain why the significance of these
"size" words didn't dawn on me sooner.
5. SYMBOL DICTIONARIES
This area contains all available documentation of declared symbols and
procedure blocks defined within the unit. Depending on compiler
options in effect when the unit was compiled, this section will
contain at a minimum, the INTERFACE declarations, and at a maximum,
ALL declarations. The information stored in the dictionary is highly
dependent on the context of the symbol declared. We defer further
explanation to the appropriate section which follows.
5.1 ORGANIZATION
A dictionary is organized with a Hash Table as its root. The hash
table is used to provide rapid access to identifiers.
A dictionary may be thought of as a directed graph. Each subgraph is
rooted in a hash table. There may be a great many hash tables in a
given unit and their number depends on unit complexity as well as the
options chosen when the unit was compiled. Use of the {$L+} directive
produces the largest dictionaries. The hash tables are explained in
detail a few sections further on.
──────────────────────────────────────────────────────────────────────
June 6, 1991 Page 16
Inside TURBO Pascal Unit Files
──────────────────────────────────────────────────────────────────────
Hash tables point to Name Entries. When two or more symbols produce
the same hash function result, a "collision" is said to occur.
Collisions are resolved by the time-honored method of chaining
together the Name Entries of those symbols having the same hash
function result. Dictionary supersetting is accomplished using these
chains.
5.2 INTERFACE DICTIONARY
The INTERFACE dictionary contains all symbols and the necessary
explanatory data for the INTERFACE section of a Unit. Symbols get
added to the Unit using increasing storage addresses until the
IMPLEMENTATION section is encountered.
5.3 DEBUG DICTIONARY
The Debug dictionary (if present) is a superset of the INTERFACE
dictionary. It is used by the Turbo Debugger to support its many
features when tracing through a unit. If present, this dictionary is
rooted in its own hash table. The hash table is effectively
initialized when the IMPLEMENTATION keyword is processed by the
compiler. This takes the form (initially) of an unmodified copy of
the INTERFACE hash table, to which symbols are added in the usual
fashion. Thus, the hash chains constructed or extended at this time
lead naturally to the INTERFACE chains and this is how the superset is
effectively implemented.
5.4 DICTIONARY ELEMENTS
The dictionary contains four major elements. These are: hash tables,
Name Entries, Name Stubs and Type Descriptors. The distinction
between Name Entries and Name Stubs might appear to be rather
arbitrary. They might just as easily be regarded as a single element
(such as symbol entry). However, the case for the separate entity
approach is strong since Stubs are DIRECTLY addressed via LG's and --
more to the point -- ONLY by LG's. Thus, it seems reasonable that
this is a separate and very important structure -- at least in the
minds of the architects at Borland.
──────────────────────────────────────────────────────────────────────
June 6, 1991 Page 17
Inside TURBO Pascal Unit Files
──────────────────────────────────────────────────────────────────────
5.4.1 HASH TABLES
As has been intimated, Hash Tables are the glue that binds the
dictionary entries together and gives the dictionary its "shape".
They effectively implement the scope rules of the language and speed
access to essential information.
Each Hash table begins with a 2-byte size descriptor. This descriptor
contains the number of bytes in the table proper (less 2). Thus, the
descriptor directly points to the last bucket in the hash table. For
a hash table of 128 bytes, the size descriptor contains 126. The
first bucket in the table immediately follows the size descriptor.
5.4.1.1 SIZE
So far, three different hash table sizes have been observed. The
INTERFACE and DEBUG hash tables are usually 128 bytes (64 entries) in
size plus 2 bytes of size description, but the SYSTEM.TPU unit is a
special case, containing only 16 entries. Hash tables which anchor
subgraphs whose scope is relatively local usually contain four (4)
entries (8 bytes).
Graphically, a Hash Table with four slots has the following layout:
┌────────────────────┐
│ 0006h │ Size Descriptor
├════════════════════┤
│ slot 0 │ an LL or zero
├────────────────────┤
│ slot 1 │ an LL or zero
├────────────────────┤
│ slot 2 │ an LL or zero
├────────────────────┤
│ slot 3 │ an LL or zero
└────────────────────┘
It should be noted that the Size Descriptor furnishes an upper bound
for the hash function itself. Thus, it seems possible that a single
hash function is used for all hash tables and that its result is ANDed
with the Size Descriptor to get the final result. Because the sizes
are chosen as they are (powers of 2) this is feasible. Note that in
the above example, 6 = 2 * (n - 1) where n = 4 {slot count}. All of
the hash tables observed so far have this property.
One final note on this subject. Given these properties, "Folding" of
sparse hash tables is a rather trivial exercise so long as the new
hash table also contains a number of slots that is a power of 2. This
point is intriguing when one recalls that the System.TPU hash table
has only 16 slots rather than the usual 64.
──────────────────────────────────────────────────────────────────────
June 6, 1991 Page 18
Inside TURBO Pascal Unit Files
──────────────────────────────────────────────────────────────────────
5.4.1.2 SCOPE
The INTERFACE and Debug dictionary hash tables are Global in Scope
even though the symbols accessed directly via either hash table may be
private. On the other hand, other hash tables are purely local in
scope. For example, the fields declared within a record are reached
via a small local hash table, as are the arguments and local variables
declared within procedures and functions. Even OBJECTS use this
technique to provide access to Methods and Object Fields.
Access to such local scope fields/methods requires use of qualified
names which ensures conformity to Pascal scope rules. The method is
truly simple and elegant.
5.4.1.3 SPECIAL CASES
The SYSTEM.TPU Unit is a special case. Its INTERFACE hash table has
apparently been "hand-tuned" for small size and it contains only
sixteen (16) entries. I have always felt that "hand-coding" must have
been used to achieve the SYSTEM unit. The implications of the file
"SYSTEM.TPS" required for compilation of the SYSTEM unit seem to
support this opinion. Certainly, there are aspects of this unit that
appear conventional, but there is much that is unique and apparently
not the result of PASCAL coding. Library sources should help clarify
this. (See 2.2 SYSTEM UNIT on page 8)
──────────────────────────────────────────────────────────────────────
June 6, 1991 Page 19
Inside TURBO Pascal Unit Files
──────────────────────────────────────────────────────────────────────
5.4.2 NAME ENTRIES
This is the structure that anchors all information known by the
compiler about any symbol. The format is as follows:
DNameRec = RECORD
HLink : LL; { Hash Chain Link; Resolves Collisions }
DForm : Char; { Symbol Class }
DSymb : STRING[63]; { Text of Symbol (UPPER-CASE) }
END;
HLink: An LL which points to the next (previous) symbol in the
same unit which had the same hash function value.
DForm: A character that defines the class the symbol belongs to
and defines the format of the Name Stub which follows the
Name Entry. If the symbol is declared in the component
list of the "private" part of an Object declaration, then
this character is modified by adding $80 to its ordinal
value. Thus, an ordinary Function, Procedure or Method is
of category "S" while a private Method is of category
Chr(Ord('S')+$80).
DSymb: A String (in the Pascal sense) of variable size that
contains the text of the symbol (in UPPER-CASE letters
only). The SizeOf function is not defined for these
strings since they are truncated to match the symbol size.
The "value" of the SizeOf function can be determined by
adding 1 to the first byte in the string. Thus,
Ord(Symbol[0])+1 is the expression that defines the Size
of the symbol string. Turbo Pascal defines a symbol as a
string of relatively arbitrary size, the most significant
63 characters of which will be stored in the dictionary.
Thus, we conclude that the maximum size of such a string
is 64 bytes.
5.4.3 NAME STUBS
Name Stubs immediately follow their respective Name Entries and their
format is determined by the class code in the Name Entry. The
function of the stub is to organize the information appropriate to the
symbol and provide a means of accessing additional information such as
type descriptors, constant values, parameter lists and nested scopes.
The format of each Stub is presented in the following sub-sections.
5.4.3.1 LABEL DECLARATIVES ("O")
This Stub consists of a WORD whose function is (as yet) unknown.
──────────────────────────────────────────────────────────────────────
June 6, 1991 Page 20
Inside TURBO Pascal Unit Files
──────────────────────────────────────────────────────────────────────
5.4.3.2 UN-TYPED CONSTANTS ("P")
Format is as follows (CASE fragment):
'P':( { --- For Untyped Constants --- }
sPTD : LG; { to type descriptor }
sPV1 : LongInt; { constant value - size variable }
);
sPTD: An LG which points to a Type Descriptor (usually in
SYSTEM.TPU). This establishes the minimum storage
requirement for the constant. The rules vary with the
type, but the size of the constant data field (which
follows) is defined using the Type Descriptor(s).
sPV1: The value of the constant. For ordinal types, this value
is stored as a LONGINT (size=4 bytes). For Floating-Point
types, the size is implicit in the type itself. For
String types, the size is determined from the length of
the string which is stored in the initial byte of the
constant.
5.4.3.3 NAMED TYPES ("Q")
This Stub consists of an LG (4-bytes) that points to the Type
Descriptor for this symbol.
──────────────────────────────────────────────────────────────────────
June 6, 1991 Page 21
Inside TURBO Pascal Unit Files
──────────────────────────────────────────────────────────────────────
5.4.3.4 VARIABLES, FIELDS, TYPED CONS ("R")
This Stub contains information required to allocate and describe these
types of entities. The format and content is as follows:
'R': ( { -- Variable, Field, Object -- }
sRAM: Byte; { allocation method codes: }
sRVF: CASE sRAM: Byte Of
$02,$06,
$22,$26: (ROfs : Word; { allocation offset (BP) }
ROB : Word); { To Parent Scope/Zero }
$00,$01: (TOfs : Word; { allocation offset in map}
TOB : LL); { offset in VAR/CONST Map }
$03: (AFar : Word); { FAR Pointer to Location }
$08: (Bofs : Word; { Offset-Record Relative }
RChn : LL); { To Next Field/Method }
$10: (QLG : LG); { to Stub of Allocator }
END;
sRTD: LG); { to Type Descriptor }
sRAM: A one-byte flag that precisely identifies the class of the
item being described. The known values and their apparent
meanings follow:
$00 -> Global Variables (Allocated in DS);
$01 -> Typed Constants (Allocated in DS);
$02 -> Procedure LOCAL Variables on STACK;
$03 -> Variables at Absolute Addresses;
$06 -> ADDRESS Arguments allocated on STACK; (This is now
used only for SELF in Method calls;)
$08 -> Fields sub-allocated in RECORDS and OBJECTS, plus
METHODS declared for OBJECTS.
$10 -> Variable Equivalenced to another via the
Absolute Clause;
$22 -> Arguments whose VALUEs are passed on the stack;
$26 -> Arguments whose ADDRESSes are passed on the stack.
sRVF: Two words whose content vary with sRAM above. Their are
shown as case variants in the following:
$02,$06,$22,$26: {arguments}
sRVF.ROfs: Word -- Offset relative to either DS or BP.
sRVF.ROB: Word -- LL to Dict Header of Parent Scope, or zero.
$00,$01: {VAR's or typed CONSTs}
sRVF.TOfs: Word -- Offset relative to allocation area origin;
sRVF.TOB: Word -- Offset to entry in VAR/CONST Map for item
allocation;
$03: {Absolute Address Variable}
sRVF.AFar: POINTER -- FAR Pointer to Absolute Memory Address.
──────────────────────────────────────────────────────────────────────
June 6, 1991 Page 22
Inside TURBO Pascal Unit Files
──────────────────────────────────────────────────────────────────────
$08: {Record/Object Fields/Methods}
sRVF.BOfs: Word -- Allocation Offset within Record/Object;
sRVF.RChn: Word -- LL to next Field/Method.
$10: {Absolute Equivalences}
sRVF.QLG: LG -- LG to STUB of variable/parameter declaration
that actually establishes the allocation;
sRTD: An LG that locates the proper Type Descriptor for this
symbol.
──────────────────────────────────────────────────────────────────────
June 6, 1991 Page 23
Inside TURBO Pascal Unit Files
──────────────────────────────────────────────────────────────────────
5.4.3.5 SUBPROGRAMS & METHODS ("S")
Subprograms (PROC's), especially since Object Methods are supported,
have a rather involved stub. Its format is as follows:
'S': ( { ------ User Subprograms ----- }
sSTp : Byte; { BIT Encoded Flags }
sSxx : Byte; { More Attribute Flags? }
sSPM : Word; { Code byte count if INLINE, }
{ else, offset to PROC Map }
sSPS : LL; { to containing scope or zero }
sSHT : LL; { to local scope hash table }
sSVM : Word); { VMT Offset-VIRTUAL Method PTR }
sSTP: A byte that contains bit-switches that seem to describe
the Call Model and imply the size of this stub. These
switches determine what kind of code (if any) is generated
when the PROC is referenced. The observed values are as
follows:
xxxxx001 -> PROC uses FAR Call Model;
xxxx0010 -> PROC uses INLINE Model (no Call);
xxxx0100 -> PROC uses INTERRUPT Model (no Call);
xxxx100x -> PROC has EXTERNAL attribute;
xxx1xxxx -> PROC uses METHOD Call Model;
x011xxxx -> PROC is a CONSTRUCTOR Method;
x101xxxx -> PROC is a DESTRUCTOR Method;
1xxxxxxx -> PROC has ASSEMBLER directive.
sSxx: A byte whose function is not yet fully known. In the
Windows compiler it is copied into the PROC Map -
presumably for use by the linker or debugger. Bit
positions firmly established are as follows (7..0):
7-6: always zero?
5: ????
4: Dynamic Call Model using DMT
3-2: 11 = DLL PROC Referenced by NAME
01 = DLL PROC Referenced by INDEX
1: ????
0: always zero???
sSPM: A Word whose interpretation depends on whether or not we
have an INLINE Declarative Subprogram. If this is an
INLINE Declarative Subprogram, then this word contains the
byte-count of the INLINE code text at the end of this
stub. Otherwise, this word is the offset within the PROC
Map that locates the object code for this Subprogram.
sSPS: A Word that contains an LL which locates the containing
scope in the dictionary, or zero if none.
──────────────────────────────────────────────────────────────────────
June 6, 1991 Page 24
Inside TURBO Pascal Unit Files
──────────────────────────────────────────────────────────────────────
sSHT: A Word that contains an LL which locates the local Hash
Table for this scope. A local hash table provides access
to all formal parameters of the Subprogram as well as all
Symbols whose declarations are local to the scope of this
Subprogram.
sSVM: A Word that is zero unless the symbol is a Virtual Method.
In this case, then the content is the offset within the
VMT for the owning object that defines where the FAR
POINTER to this Virtual Method is stored.
+0A: A complete Type-Descriptor for this Subprogram. The
length is variable and depends upon the number of Formal
Parameters declared in the header. (See 5.4.4.3.5 on page
34).
+??: If this Symbol represents an INLINE Declarative
Subprogram, then the object-code text begins here. The
byte-count of the text is stored in sSPM in this stub.
5.4.3.6 TURBO STD PROCEDURES ("T")
This Stub consists of two bytes, the first of which is unique for each
procedure and increments by 4. I have found nothing in the SYSTEM
unit (which is where this entry appears) that this seems directly
related to. The second byte is always zero.
5.4.3.7 TURBO STD FUNCTIONS ("U")
This Stub consists of two bytes, the first of which is unique for each
function and increments by 4. I have found nothing in the SYSTEM unit
(which is where this entry appears) that this seems directly related
to. I wouldn't be surprised if this byte were an index into a TURBO
compiler table that points to specialized parse tables/action routines
for handling these functions and their non-standard parameter lists.
The second byte seems to be a flag having the values $00, $40 and $C0.
I strongly suspect that the flag $C0 marks exactly those functions
which may be evaluated at compile-time. The meaning behind the other
values is not known to me.
5.4.3.8 TURBO STD "NEW" ROUTINE ("V")
This Stub consists of a WORD whose function is (as yet) unknown. This
is the only Standard Turbo routine that can behave as a procedure as
well as a function (returning a pointer value).
──────────────────────────────────────────────────────────────────────
June 6, 1991 Page 25
Inside TURBO Pascal Unit Files
──────────────────────────────────────────────────────────────────────
5.4.3.9 TURBO STD PORT ARRAYS ("W")
This Stub consists of a byte whose value is 0 for byte arrays, and 1
for word arrays.
5.4.3.10 TURBO STD EXTERNAL VARIABLES ("X")
This Stub consists of an LG (4-bytes) that points to the Type
Descriptor for this symbol. (These are used for the arrays MEM, MEMW
and MEML.)
5.4.3.11 UNITS ("Y")
Unit Stubs have the following content:
+00: A Word whose apparently reserved for use by the Compiler
or Linker.
+02: A Word that seems to contain some kind of "signature" used
to detect inconsistent Unit Versions. Borland calls this
a "unit version number, which is basically a checksum of
the interface part." I have seen a thread in CIS which
says that it is a CRC value. Food for thought?
+04: A Word that contains an LL which locates the Successor
Unit in the "Uses" list. In fact, the "Uses" lists of
both the INTERFACE and IMPLEMENTATION sections of the Unit
are merged by this Word into a single list. A value of
zero is used to indicate no successor.
+06: A Word that contains an LL which locates the Predecessor
Unit in the "Uses" list. For the SYSTEM unit entry, this
value is always zero to indicate no predecessor. For the
Unit being compiled, this LL locates the final Unit in the
combined "Uses" list.
In effect, the two LL's at offsets 0004 and 0006 organize the units
into both forward and backward linked chains. The entry for the unit
being compiled is effectively the head of both the forward and the
backward chains. The final unit in the merged "Uses" list is the tail
of the forward chain, and the SYSTEM unit is the tail of the backward
chain.
──────────────────────────────────────────────────────────────────────
June 6, 1991 Page 26
Inside TURBO Pascal Unit Files
──────────────────────────────────────────────────────────────────────
5.4.4 TYPE DESCRIPTORS
Type Descriptors store much of the semantic information that applies
to the symbols declared in the unit. Implementation details can be
managed using high-level abstractions and these abstractions can be
shared.
5.4.4.1 SCOPE
Type Descriptor sharing can occur across the boundaries which are
implicit in unit modules. Thus, a type defined in one unit may be
"imported" by some other module. Also, the pre-defined Pascal Types
(plus the Turbo Pascal extensions) are defined in the SYSTEM.TPU unit
and there needs to be a means of "importing" such Type Descriptors
during compilation. This is precisely the objective of the LG locator
(see Section 3.2 on Page 9). Type Descriptors are NEVER copied
between units. The binding always occurs by reference at compile time
and this helps support the technique of modifying a unit and compiling
it to a .TPU file, then re-compiling all units/programs that "USE" it.
Type Descriptors have many roles so their format varies. We have
divided these structures into two parts: The PREFIX Part (which is
always present and) whose format is fairly constant and the SUFFIX
Part whose content and format depends on the attributes that are part
of the type definition.
──────────────────────────────────────────────────────────────────────
June 6, 1991 Page 27
Inside TURBO Pascal Unit Files
──────────────────────────────────────────────────────────────────────
5.4.4.2 PREFIX PART
The Prefix Part of every Type Descriptor consists of six (6) bytes.
The usage is consistent for all types observed by this author and the
format is as follows:
+00: A Byte that identifies the format of the Suffix part.
This is essentially based on several high-level categories
which the Suffix Parts support directly. The observed set
of values is as follows:
00h -> an un-typed entity;
01h -> an ARRAY type;
02h -> a RECORD type;
03h -> an OBJECT type;
04h -> a FILE type (other than TEXT);
05h -> a TEXT File type;
06h -> a SUBPROGRAM type;
07h -> a SET type;
08h -> a POINTER type;
09h -> a STRING type;
0Ah -> an 8087 Floating-Point type;
0Bh -> a REAL type;
0Ch -> a Fixed-Point ordinal type;
0Dh -> a BOOLEAN type;
0Eh -> a CHAR type;
0Fh -> an Enumerated ordinal type.
+01: A Byte used as a modifier. Since the above scheme is too
general for machine-dependent details such as storage
width and sign control, this modifier byte supplies
additional data. The author has identified several cases
in which this information is vital but has not spent very
much time on the subject. The chief areas of importance
seem to be in the 8087 Floating-Point types, and the
Fixed-Point ordinal types. The semantics seem to be as
follows:
0A 00 -> The type "SINGLE"
0A 02 -> The type "EXTENDED"
0A 04 -> The type "DOUBLE"
0A 06 -> The type "COMP"
0C 00 -> an un-named BYTE integer
0C 01 -> The type "SHORTINT"
0C 02 -> The type "BYTE"
0C 04 -> an un-named WORD integer
0C 05 -> The type "INTEGER"
0C 06 -> The type "WORD"
0C 0C -> an un-named double-word integer
0C 0D -> The type "LONGINT"
──────────────────────────────────────────────────────────────────────
June 6, 1991 Page 28
Inside TURBO Pascal Unit Files
──────────────────────────────────────────────────────────────────────
+02: A Word that contains the number of bytes of storage that
are required to contain an object/entity of this type.
For types that represent variable-length objects/entities
such as strings, this word may define the value returned
by the SIZEOF function as applied to the type.
This word is probably of value during compilation of un-
typed CONST's since the size of their Stubs depend on this
field. For STRING types however, the length descriptor is
part of the string itself.
+04 A Word that is zero (for DOS units) unless the descriptor
is for an Object Method. In this case, the content is an
LL to the Name Entry of the SUCCEEDING Method for the
Object, in order of declaration, or zero if none. Some
Windows units (e.g., SYSTEM) have non-zero values here
whose function is not known.
5.4.4.3 SUFFIX PARTS
Suffix Parts further refine the implementation details of the type and
also provide subrange constraints where appropriate. In some cases
the Suffix part is empty since all semantic data for the type is
contained in the Prefix part.
5.4.4.3.1 UN-TYPED
This Suffix Part is empty. Nothing is known about an un-typed entity.
5.4.4.3.2 STRUCTURED TYPES
The structured types represent aggregates of lower-level types. We
include ARRAY, RECORD, OBJECT, FILE, TEXT, SET, POINTER and STRING
types in this category.
──────────────────────────────────────────────────────────────────────
June 6, 1991 Page 29
Inside TURBO Pascal Unit Files
──────────────────────────────────────────────────────────────────────
5.4.4.3.2.1 ARRAY TYPES
The Suffix Part of the ARRAY type is so constructed as to be able to
support recursive or nested definition of arrays. The suffix format
is as follows:
+00: An LG that locates the Type Descriptor for the "base-type"
of the array. This is the type of the entity being
arrayed (which may itself be an array).
+04: An LG that locates the Type Descriptor for the array
bounds which is a constrained ordinal type or subrange.
5.4.4.3.2.2 RECORD TYPES
RECORD types have nested scopes. The Suffix part provides a base
structure by which to locate the fields local to the scope of the
Record type itself. The format is as follows:
+00: A Word containing an LL which locates the local Hash Table
that provides access to the fields in the nested scope.
+02: A Word containing an LL which locates the Name Entry of
the initial field in the nested scope. This supports a
"left-to-right" traversal of the fields in a record.
──────────────────────────────────────────────────────────────────────
June 6, 1991 Page 30
Inside TURBO Pascal Unit Files
──────────────────────────────────────────────────────────────────────
5.4.4.3.2.3 OBJECT TYPES
OBJECT types also have nested scopes. The Suffix part provides a base
structure by which to locate the fields and METHODS local to the scope
of the OBJECT type itself. In addition, inheritance and VMT
particulars are stored. The format is as follows:
+00: A Word containing an LL which locates the local Hash Table
that provides access to the fields and METHODS local to
the nested scope.
+02: A Word containing an LL which locates the Name Entry of
the initial field or METHOD in the nested scope. This
supports a "left-to-right" traversal of the fields and
METHODS in an OBJECT.
+04: An LG which locates the Type Descriptor of the Parent
Object. This field is zero if there is no such Parent.
+08: A Word which contains the size in bytes of the VMT for
this Object. This field is zero if the object employs no
Virtual Methods, Constructors or Destructors.
+0A: A Word which contains the offset within the CONST DSeg Map
that locates the VMT skeleton or template segment. This
field equals FFFFh if the object employs no Virtual
Methods, Constructors or Destructors.
+0C: A Word which contains the offset within an Object instance
where the NEAR POINTER to the VMT for the object is stored
(within the DATA SEGMENT). This field equals FFFFh if the
object employs no Virtual Methods, Constructors or
Destructors.
+0E: A Word which contains an LL which locates the Name Entry
for the name of the OBJECT itself.
+10: A Word containing $FFFF in DOS units. In WINDOWS units
this word contains the offset within the CONST DSeg Map
that locates the DMT skeleton or template segment. This
field equals FFFFh if the object employs no Dynamic
Methods.
+12: Three Words (not yet understood) containing zeroes.
5.4.4.3.2.4 FILE (NON-TEXT) TYPES
This Suffix consists of an LG that locates the Type Descriptor of the
base type of the file. Note that the Type Descriptor may be that of
an un-typed entity (for un-typed files).
──────────────────────────────────────────────────────────────────────
June 6, 1991 Page 31
Inside TURBO Pascal Unit Files
──────────────────────────────────────────────────────────────────────
5.4.4.3.2.5 TEXT FILE TYPES
This Suffix consists of an LG that locates the Type Descriptor of the
base type of the file -- in this case SYSTEM.CHAR.
5.4.4.3.2.6 SET TYPES
This Suffix consists of an LG that locates the base-type of the set
itself. Pascal limits such entities to simple ordinals whose
cardinality is limited to 256.
5.4.4.3.2.7 POINTER TYPES
This Suffix consists of an LG that locates the base-type of the entity
pointed at.
5.4.4.3.2.8 STRING TYPES
This is a special case of an ARRAY type. The format is as follows:
+00: An LG to the Type Descriptor SYSTEM.CHAR which is the base
type of all Turbo Pascal Strings.
+04: An LG to the Type Descriptor for the array bounds
constraints for the string. When the unconstrained STRING
type is used, this points to SYSTEM.BYTE which is defined
as a subrange 0..255.
5.4.4.3.3 FLOATING-POINT TYPES
The Suffix part for all Floating-Point types is EMPTY. All data
needed to specify these approximate number types is contained in the
Prefix part. The Types included in this class are SINGLE, DOUBLE,
EXTENDED, COMP and REAL.
5.4.4.3.4 ORDINAL TYPES
The Ordinal Types consist of the various "integer" types plus the
BOOLEAN, CHAR and Enumerated types.
──────────────────────────────────────────────────────────────────────
June 6, 1991 Page 32
Inside TURBO Pascal Unit Files
──────────────────────────────────────────────────────────────────────
5.4.4.3.4.1 "INTEGERS"
These types include BYTE, SMALLINT, WORD, INTEGER and LONGINT. Their
Suffix parts are identical in format:
+00: A double-word containing the LOWER bound of the subrange
constraint on the type;
+04: A double-word containing the UPPER bound of the subrange
constraint on the type;
+08: An LG that locates the Type Descriptor of the largest
upward compatible type. This is the Type Descriptor that
is used to control the width of an un-typed constant in
the dictionary stub. For the "integer" types, this is an
LG to SYSTEM.LONGINT.
5.4.4.3.4.2 BOOLEANS
This type Suffix has the following format:
+00: A double-word containing the LOWER bound of the subrange
constraint on the type;
+04: A double-word containing the UPPER bound of the subrange
constraint on the type;
+08: An LG that locates the Type Descriptor SYSTEM.BOOLEAN.
There is no "upward compatible" type.
5.4.4.3.4.3 CHARS
This type Suffix has the following format:
+00: A double-word containing the LOWER bound of the subrange
constraint on the type;
+04: A double-word containing the UPPER bound of the subrange
constraint on the type;
+08: An LG that locates the Type Descriptor SYSTEM.CHAR. There
is no "upward compatible" type.
──────────────────────────────────────────────────────────────────────
June 6, 1991 Page 33
Inside TURBO Pascal Unit Files
──────────────────────────────────────────────────────────────────────
5.4.4.3.4.4 ENUMERATIONS
This type Suffix is unusual and has the following format:
+00: A double-word containing the LOWER bound of the subrange
constraint on the type;
+04: A double-word containing the UPPER bound of the subrange
constraint on the type;
+08: An LG that locates the Prefix of the current Type
Descriptor. There is no upward compatible type.
What follows is a full-fledged SET Type Descriptor whose base type is
the Type Descriptor of the Enumerated Type itself. The author has not
yet discovered the reason for this.
At least one case has been observed where a set type descriptor is
followed by a word containing zero but I know of no explanation.
Could this be a (shudder) BUG in Turbo?
5.4.4.3.5 SUBPROGRAM TYPES
The length of this Suffix is variable. The format is as follows:
+00: An LG that locates the Type Descriptor of the FUNCTION
result returned by the Subprogram. This field is zero if
the Subprogram is a PROCEDURE.
+04: A Word that contains the number of Formal Parameters in
the Function/Procedure header. If non-zero, then this
word is followed by the parameter list itself as a simple
array of parameter descriptors.
The format of a parameter descriptor is as follows:
0000: An LG that locates the Type Descriptor of the
corresponding parameter;
0004: A Byte that identifies the parameter passing
mechanism used for this entry as follows:
02h -> VALUE of parameter is passed on STACK,
06h -> ADDRESS of parameter is passed on STACK.
──────────────────────────────────────────────────────────────────────
June 6, 1991 Page 34
Inside TURBO Pascal Unit Files
──────────────────────────────────────────────────────────────────────
6. MAPS AND LISTS
The "MAPS and LISTS" are not part of the symbol dictionary. Rather,
these structures provide access to the Code and Data Segments produced
by the compiler or included via the {$L name.OBJ} directive. The
format and purpose (as understood by this author) of each of these
tables is explained in the following sections.
6.1 PROC MAP
The PROC Map provides a means of associating the various Function and
Procedure declarations with Code Segments and DLL's. There is some
evidence that the Compiler produces CODE (and DATA) Segments for EACH
of the Subprograms defined in the Unit as well as for the un-named
Unit Initialization code block. There is also evidence that EXTERNAL
PROCs must be assembled separately in order to exploit fully the Turbo
"Smart Linker" since Turbo Pascal places some significant restrictions
on EXTERNAL routines in the area of Segment Names and Types.
Specifically, only code segments named "CODE" and data segments named
"DATA" or "CONST" will be used by the "Smart Linker" as sources of
code and data for inclusion in a Turbo Pascal .EXE file. (Turbo 6.0
relaxed Name constraints but only one code segment per .OBJ remains a
limitation).
The first entry in the PROC Map is reserved for Unit Initialization
block. If there is no Unit Initialization block, this entry will be
marked with $FFFF. In addition, each and every PROC in the Unit has
an entry in this table (except for INLINE procs).
If an EXTERNAL routine is included, then ALL PUBLIC PROC definitions
in that routine must be declared in the Unit Source Code with the
EXTERNAL attribute.
The size of the PROC Map Table (in Bytes) is implied in the Unit
Header by the LL's named UHPMT and UNCMT.
The Format of a single PROC Map Entry is as follows:
+00: A Word presumably reserved as a work area; always zero.
+02: A Word which contains Flags copied from sSxx in the Stub
for the Subprogram. This word is always zero for the DOS
compiler. (see 5.4.3.5, page 24)
+04: A Word that contains an offset within the CSeg Map. This
is used to locate the code segment containing the PROC.
If the PROC is found in a DLL, then this word is an offset
within the DLL List to the DLL name (i.e., the file with
the .DLL extension).
──────────────────────────────────────────────────────────────────────
June 6, 1991 Page 35
Inside TURBO Pascal Unit Files
──────────────────────────────────────────────────────────────────────
+06: A Word that contains an offset within the CODE Segment
that defines the PROC entry point relative to the load
point of the referenced CODE Segment if local to this
unit. For DLL PROCS referenced by "INDEX" this word is
the procedure "INDEX" number within the DLL. For DLL
PROCS referenced by "NAME" this word is an offset to that
name which is stored in the DLL List.
6.2 CSEG MAP
The CSeg Map provides a convenient descriptor table for each CODE
Segment present in the Unit and serves to relate these segments with
the Segment Relocation Data and the Segment Trace Table. It seems
reasonable to infer that the "Smart Linker" is able to include/exclude
code/data at the SEGMENT level only.
The CSeg Map is an array of fixed-length records whose format is as
follows:
+00: A Word apparently reserved for use by TURBO.
+02: A Word that contains the Segment Length (in bytes).
+04: A Word that contains the Length of the Fix-Up Data Table
for this Code Segment (in bytes).
+06: A Word that contains the offset of the Trace Table Entry
for this Segment (if it was compiled with DEBUG Support).
If there is no Trace Table for this segment, then this
Word contains FFFFh.
──────────────────────────────────────────────────────────────────────
June 6, 1991 Page 36
Inside TURBO Pascal Unit Files
──────────────────────────────────────────────────────────────────────
6.3 TYPED CONST DSEG MAP
The CONST DSeg Map provides a convenient descriptor table for each
DATA Segment which was spawned by the presence of Typed Constants or
VMT's in the Pascal Code. It serves to relate these segments with the
Segment Fix-Up (relocation) Data and with the Code Segments that refer
to these DATA elements. One entry is present for each CONST
declaration part containing typed constants and for each CONST segment
linked from an ".OBJ" file. The CONST DSeg Map is an array of fixed-
length records whose format is as follows:
+00: A Word apparently reserved for use by TURBO.
+02: A Word that contains the Segment Length (in bytes).
+04: A Word that contains the Length of the Fix-Up Data Table
for this DATA Segment (in bytes).
+06: A Word that contains an LL which locates the OBJECT that
owns this VMT or DMT template or zero if the segment is
not a VMT or DMT template.
One can determine the defining block for a Typed Constant declaration
and our program attempts to do just that. A by-product of the
dictionary mapping algorithm allows the declaring block to be found
and its qualified name printed. This information is also used to
explain fix-up data as to its source. Results will be incomplete
unless a really comprehensive dictionary is present in the unit.
6.4 GLOBAL VAR DSEG MAP
The VAR DSeg Map provides a convenient descriptor table for each DATA
Segment present in the Unit.
One entry exists for each VAR declaration part whose scope is not
local to a PROC and so is allocated in the DATA Segment. CODE
Segments may have references to these in the CODE Fix-Up Data Table.
Each EXTERNAL CSeg having a segment named DATA also spawns an entry in
this table.
The VAR DSeg Map is an array of fixed-length records whose format is
as follows:
+00: A Word apparently reserved for use by TURBO.
+02: A Word that contains the Segment Length (in bytes). This
may be zero, especially if the EXTERNAL routine contains a
DATA segment whose sole purpose is to declare one or more
EXTRN symbols that are defined in some DATA segment
external to the Assembly.
──────────────────────────────────────────────────────────────────────
June 6, 1991 Page 37
Inside TURBO Pascal Unit Files
──────────────────────────────────────────────────────────────────────
+04: A Word apparently reserved for use by TURBO.
+06: A Word apparently reserved for use by TURBO.
One can determine the defining block for a Global VARiable declaration
and our program attempts to do just that. A by-product of the
dictionary mapping algorithm allows the declaring block to be found
and its qualified name printed. This information is also used to
explain fix-up data as to its source. Results will be incomplete
unless a really comprehensive dictionary is present in the unit. Such
DSegs can be referenced by many CSegs and we only locate the first
one. This is okay for Pascal code but it's ambiguous for assembler
since the names may be PUBLIC and referenced by more than one module.
6.5 DLL LIST
This list is present ONLY in Units compiled by the Windows Version and
then only if the unit calls Dynamic Link Library (DLL) PROCS. The DLL
List has the following format:
+00: Four (4) bytes of binary zeroes (reserved for work?).
+04: A variable-sized String that contains the name of the DLL
MEMBER name or the PROC name (for DLL reference by NAME).
The string is truncated to actual size as usual for a
unit.
Procedures or Functions which reside in DLL's have entries in the PROC
map but NOT in the CSeg Map since the executable code is external.
6.6 DONOR UNIT LIST
This list contains an entry for each Unit (taken from the "USES" list)
which MAY contribute either CODE or DATA to the executable file. Not
all units do make such a contribution as some exist merely to define a
collection of Types, etc. A Unit gets into this list if there exists
a single Fix-Up Data Entry that references CODE or DATA in that Unit.
The list is comprised of elements whose SIZE is variable and whose
format is as follows:
+00: A WORD apparently reserved for use by TURBO.
+02: A variable-length String containing the unit name.
──────────────────────────────────────────────────────────────────────
June 6, 1991 Page 38
Inside TURBO Pascal Unit Files
──────────────────────────────────────────────────────────────────────
6.7 SOURCE FILE LIST
This list contains an entry for each "source" file used to compile the
Unit. This includes the Primary Pascal file, files containing Pascal
code included by means of the {$I filename.xxx} compiler directive,
and .OBJ files included by the {$L filename.OBJ} compiler directive.
The order of entries in this list is critical since it maps the CODE
segments stored in the unit. The order of the entries is as follows:
The Primary Pascal file;
All Included Pascal files;
All Included .OBJ files.
Mapping of CSegs to files is done as follows:
Each .OBJ file contributes a SINGLE Code Segment (if any). Note
that this author has not observed an .OBJ module that
contains only a DATA Segment (but that seems a distinct
possibility).
The Primary Pascal file (augmented by all included Pascal Files)
contributes zero or more CODE Segments.
Therefore, there are at least as many CSeg entries as .OBJ files. If
more, then the excess entries (those at the front of the list) belong
to the Pascal files that make up the Pascal source for the unit.
The format of an entry in this list is as follows:
+00: A flag byte that indicates the type of file represented;
04h -> the Primary Pascal Source File,
03h -> an Included Pascal Source File,
05h -> an .OBJ file that contains a CODE segment
06h -> an .RES file from {$R xxx.RES} (Windows RESOURCE).
+01: A Word apparently reserved for use by the Compiler/Linker.
+03: A Word that is zero for .OBJ files and which contains the
file directory time-stamp for Pascal Files.
+05: A Word that is zero for .OBJ files and which contains the
file directory date-stamp for Pascal Files.
+07: A variable-sized string containing the filename and
extension of the file used during compilation.
──────────────────────────────────────────────────────────────────────
June 6, 1991 Page 39
Inside TURBO Pascal Unit Files
──────────────────────────────────────────────────────────────────────
6.8 DEBUG TRACE TABLE
If Debug support was selected at compile time, then all Pascal code
which supports Debugging produces an entry in this table. The table
entries themselves are variable in size and have the following format:
+00: A Word which contains an LL that locates the Directory
Header of the Symbol (a PROC name) this entry represents.
+02: A Word which contains the offset (within the Source File
List) of the entry that names the file that generated the
CSeg being traced. This allows the file included by means
of the {$I filename} directive to be identified for DEBUG
purposes, as well as code produced from the Primary File.
+04: A Word containing the number of bytes of data that precede
the BEGIN statement code in the segment. For Pascal PROCS
these bytes consist of literal constants, un-typed
constants, and other data such as range-checking limits,
etc.
+06: A Word containing the Line Number of the BEGIN statement
for the PROC.
+08: A Word containing the number of lines of Source Code to
Trace in this Segment.
+0A: An array of bytes whose size is at least the number of
source code lines in the PROC. Each byte contains the
number of bytes of object code in the corresponding source
line. This appears to be an array of SHORTINT since if a
"line" contains more than 127 bytes, then a single byte of
$80 precedes the actual byte count as a sort of "escape"
and the next byte records the up to 255 bytes for the
line. This situation has not yet been fully explored. We
do not yet know what happens in the event a line is
credited with spawning more than 255 bytes of code.
7. CODE, DATA, FIX-UP INFO
This area begins at the start of the next free PARAGRAPH. This means
that its offset from the beginning of the Unit ALWAYS ends in the
digit zero.
This area contains the CODE segments, CONST DATA segments, and the
Relocation (Fix-Up) Data required for linking.
──────────────────────────────────────────────────────────────────────
June 6, 1991 Page 40
Inside TURBO Pascal Unit Files
──────────────────────────────────────────────────────────────────────
7.1 OBJECT CSEGS
Each CODE segment included in the unit appears here as specified by
the CSeg Map Table. Depending on usage, these segments may appear in
the executable file. There are no filler bytes between segments.
7.2 CONST DSEGS
This section begins at the start of the first free PARAGRAPH following
the end of the Object CSegs. This means that its offset from the
beginning of the Unit ALWAYS ends in the digit zero.
A DATA segment fragment appears here for each CSeg that declares a
typed constant, and for each OBJECT which employs Virtual Methods,
Constructors or Destructors. There are no filler bytes between
segments.
If local symbols were generated, there is always enough information to
allow documenting the scope of the declaration as well as interpreting
the data in the display since the needed type declarations would also
be available. Our program merely identifies the defining block.
──────────────────────────────────────────────────────────────────────
June 6, 1991 Page 41
Inside TURBO Pascal Unit Files
──────────────────────────────────────────────────────────────────────
7.3 FIX-UP DATA TABLES
There are - at most - two Fix-Up Data Tables in any given .TPU file.
The first is for the CODE Area and the second is for the CONST DSeg
area. Both are paragraph aligned and both have size information in
the unit header.
Turbo Pascal for DOS and Turbo Pascal for Windows apparently utilize
differing code-generation models where floating-point is concerned.
The nub of the difference appears to lie in emulation support. In the
DOS product, the 8087 emulator is included in the SYSTEM unit while a
WINDOWS DLL (WIN87EM) furnishes floating-point emulation support for
applications. This seems to be the reason for a new fix-up format and
for the way floating-point options are presented in TP for Windows.
The Table consists of an array of eight (8) byte entries whose format
is as follows:
+00: A Byte containing the offset within the Donor Unit List of
the Unit name that this entry refers to. This can be the
compiled Unit or some previously compiled external unit.
+01: A Byte of BIT switches that identify the type of reference
and the size of the needed fix-up (WORD or DWORD). A lot
of guess-work led to the following interpretation:
7654 (bits 3-0 don't seem to be used)
00-- Locate item via a PROC Map,
01-- Locate item via a CSeg Map,
10-- Locate item via a Global VAR DSeg Map,
11-- Locate item via a Const DSeg Map,
--00 WORD offset has NO effective address adjustment,
--01 WORD offset HAS an effective address adjustment,
--10 WORD SEGMENT-Only fix-up (address of some PUBLIC
segment),
--11 DWORD (FAR) pointer; possible effective address
adjustment.
+02: A Word containing the offset within the Map table
referenced according to the above code scheme.
+04: A Word containing an offset within the target segment
which will be added to the effective address. For
example, a reference to the VAR DSeg Map will require a
final offset to locate the item (variable) within the DATA
SEGMENT being referenced here. This may also be needed
for references to LITERAL DATA embedded in a CODE SEGMENT.
+06: A Word containing the offset within the CODE or DATA
segment owning this entry that contains the area to be
patched with the value of the final effective address.
──────────────────────────────────────────────────────────────────────
June 6, 1991 Page 42
Inside TURBO Pascal Unit Files
──────────────────────────────────────────────────────────────────────
In the WINDOWS environment, an additional format is possible and it
has the following appearance:
+00: A Word containing $FFFF which appears to serve as a format
identifier.
+02: A Word containing an Emulator Fix-Up type code. After
looking at many such entries in context with the object
code, the following scheme seems to be operative:
2-> target floating point op has SS: override prefix;
3-> target floating point op has CS: override prefix;
4-> target floating point op has ES: override prefix;
5-> target floating point op has NO override prefix;
6-> target floating point op is "FWAIT" ($909B).
+04: A Word that is probably always zero.
+06: Offset to the floating-point operation to be emulated.
This operation is always prefixed with a WAIT op ($9B)
unless it is an FWAIT ($909B). If an operation is not so
prefixed, then no fix-up record is generated for it.
These latter fix-up records are (probably) incorporated into the .EXE
file (following suitable transformations) so that the Windows Loader
can see and process them. Presumably, they are simply ignored if a
co-processor chip is present and working. If not, they tell the
loader where the emulated instructions are. What the loader does with
this information is pure guess-work but it probably works something
like this:
1) if the Emulator Type code in the word at +02 indicates
that a segment override prefix is present (codes 2..4),
replace the first three bytes of the instruction with the
following:
$CD $3C "xxyyyyyy" where "yyyyyy" is the least-significant
six bits of the "escape" byte (originally $D8..$DF) and
"xx" is the ones-complement of the two-bit segment
register value (00=ES, 01=CS,10=SS,11=DS).
This method would result in replacement of the WAIT op
($9B), the segment override prefix, and the "escape" byte
with the above string at program load time. This would
allow an application to run regardless of the availability
of co-processor support
2) if the Emulator Type code in the word at +02 is 5, then
there is no override prefix. Replace the first two bytes
of the instruction with the following:
$CB $jj (where "jj" is "escape" - $A4). $jj is then
chosen from the range $34..$3B.
──────────────────────────────────────────────────────────────────────
June 6, 1991 Page 43
Inside TURBO Pascal Unit Files
──────────────────────────────────────────────────────────────────────
3) if the Emulator Type code in the word at +02 is 6, then
the operation to emulate is FWAIT. Replace the $90 $9B
with $CB $3D.
Since $CB is the op-code for INT then, if emulation were in effect, we
would produce INT $34-$3D whenever a floating-point operation was
found that could be emulated.
This approach has the advantage that we don't have to commit to
emulation or non-emulation at compile-time. Rather, the decision is
made at load time and is transparent to the user. It's interesting to
note that the DOS compiler generates such code without benefit of fix-
ups whenever both 8087 and emulation support are elected since the
emulator is a component of the SYSTEM unit in DOS. In WINDOWS, we
merely include a reference to WIN87EM plus the above fix-ups.
The technique relies on the fact that 8087 ops are necessarily
prefixed by the WAIT byte (except for the "FN..." variants). This
provides sufficient space to replace as above in-situ. This approach
WILL NOT work if the code contains floating-point instructions without
a WAIT prefix byte. If the object code requires an 80287 or an 80387
(for example), then it would seem that that Interrupt 07H will have to
be serviced by WIN87EM. This is all guess-work for now. I haven't
seen any literature documenting WIN87EM techniques.
──────────────────────────────────────────────────────────────────────
June 6, 1991 Page 44
Inside TURBO Pascal Unit Files
──────────────────────────────────────────────────────────────────────
8. SUPPLIED PROGRAM
In order that the above information be made constructively useful, the
author has designed a program that automates the process of discovery.
It is not a work of art but it does give useful results provided your
PC has enough available memory.
The program source code has been re-organized many times as I simply
haven't been able resist tinkering with it. Minor changes in its
output have been implemented to enhance its usefulness.
It should be obvious that the program was not designed "top-down".
Rather, it just evolved as each new discovery was made. Later on, it
seemed reasonable to try to document some of the relations between the
various lists and tables and the program tries to make some of these
relations clear, albeit with varying degrees of success.
It may not be obvious to all readers, but the program is actually
fighting a losing battle in many respects. The ".TPU" file was not
designed with the intent of enabling de-compilation, disassembly or
de-linking. Thus, some interesting semantic information is lost
forever since it's not needed for either compilation or debugging.
For example, it doesn't seem to be possible to determine with
certainty the source file for a CONST DSeg or GLOBAL VAR DSeg where
".OBJ" files are linked into the ".TPU" file. Of course, it MAY be
possible in certain cases but, in general, there is simply not enough
information available to definitely determine the source. This is due
to the fact that one ".OBJ" file may define such a DSeg and contain a
CSeg that refers to it but, if the DSeg is PUBLIC, it may also be
referred to by other CSegs. Each of the CSegs that make such
references to the DSeg view it as an EXTERNAL as far as fix-up data is
concerned. Therefore, it's impossible to determine which of the
referencing CSegs was drawn from the same ".OBJ" file as the DSeg.
8.1 TWU1
This is the main program. It will ask for the name of the unit to be
documented. Reply with the unit name only. The program will append
the ".TPU" extension and will search for the proper file. It will
also search the appropriate library file; if necessary.
The program will then ask if the unit is a DOS or WINDOWS unit and
will require a "w" or "d" answer. This determines which unit library
file to search (TURBO.TPL or TPW.TPL) for the SYSTEM unit (among
others).
The program will then ask if Dis-Assembly is desired and will require
a "y" or "n" answer. If "y", it also asks about the CPU.
──────────────────────────────────────────────────────────────────────
June 6, 1991 Page 45
Inside TURBO Pascal Unit Files
──────────────────────────────────────────────────────────────────────
The current directory will be searched first, followed by all
directories in the current PATH. If the .TPU file is not found, the
program will search for it in the "TURBO.TPL" or in the "TPW.TPL"
(Turbo Pascal Library) file as appropriate. Units in the "USES"
list(s) will also be loaded to enable resolution of LG items.
If the desired unit is found, the program will write a report to the
current directory named "unitname.lst" which contains its analysis.
The format of the report is such that it may be copied to a printer if
that printer supports TTY control codes with form-feeds. Be judicious
in doing this however since there can be a lot of information. Some
of the units supplied by Borland can produce almost 2 MB of report
output, depending on whether it's Version 6.0 for DOS or Version 1.0
for Windows (some supplied Windows Units are BIG).
8.1.1 UNIT TWU1EQU
This Unit contains constants, types and procedures of general utility
that are not strictly unit or I/O related. One of the more powerful
procedures is a general-purpose QuickSort procedure.
It also contains a Heap Error Function that keeps track of the high-
water mark of Heap Utilization of any program that uses it. This
function gets installed automatically.
This Unit makes SOME use of the INLINE assembler for speed and not out
of sheer necessity. Some of the routines are INLINE Macros to provide
for short expansions of otherwise overhead-ridden facilities.
8.1.2 UNIT TWU1RPT
This is a Unit that contains the text-file output routines required by
the main program. This relieves the main program of some of the
tedium of handling report formatting and pagination issues.
8.1.3 UNIT TWU1UAM
This Unit contains all Type Definitions, Structures, and primitive
Functions and Procedures required by the program for ".TPU" file
acquisition and analysis. All structures documented in this report
are also documented in the interface by means of the TYPE mechanism.
Some of the structures are difficult if not impossible to define using
ISO Pascal but Turbo Pascal provides the means for getting the job
done.
──────────────────────────────────────────────────────────────────────
June 6, 1991 Page 46
Inside TURBO Pascal Unit Files
──────────────────────────────────────────────────────────────────────
Some algorithms have been cast with object-orientation in mind and
have potential for re-use in other contexts. The unit computes a
cover for the dictionary and deduces relationships between dictionary,
code, data and the CSeg, PROC, CONST and VAR Maps discussed in
Sections 6.1 through 6.4 on Pages 35..37. This information is
retrieved by the main program to drive the printing process.
This Unit also loads all units specified in the USES list of the prime
unit to allow the names of externally defined types to be recovered on
the report. Array bounds are also retrieved in this way. The code
will search for needed units in appropriate unit library file without
intervention. Close attention is paid to Heap Management and minimal
utilization of Heap storage. The dictionary areas of the Units
located in the USES list get loaded into the Heap at no extra charge.
Nothing but the dictionary area is of any use at this point. The name
and fully-qualified file name of each unit successfully loaded are
printed at the top of the listing. Unit version numbers must agree or
the unit will not be loaded. Dictionary covers are computed for each
loaded unit to aid in rapid LG-resolution.
Lack of sufficient Heap Storage will not necessarily cause the program
to fail. Heap Space MUST be available to load the primary unit and
perform the necessary analyses, but the secondary or nested units are
not essential. If they cannot be loaded, you merely lose some
descriptive information. If Heap exhaustion occurs at a critical step
however, the program will generate RunError 215.
8.1.4 UNIT TWU1UNA
This unit is a rudimentary disassembler. The output will not assemble
and may look strange to a "real" assembler programmer since I am not
well-qualified in this area. However, the basis for support of 80286,
80386 etc. processors is present as well as coprocessor support. Of
perhaps the greatest interest is that it does appear to decode the
emulated coprocessor instructions that are implemented via INT 34-3D
in the MS-DOS versions of Turbo Pascal.
Be warned however. The output is not guaranteed since this was coded
by myself and I am perhaps the rankest amateur that ever approached
this quite awful assembler language. For convenience, the operand
coding mimics TASM "Ideal" mode.
As is usual with programs of this type, error-recovery is minimal and
no context checking is performed. If the operation code is found to
be valid, then a valid instruction is assumed -- even if invalid
operands are present.
The only positives that apply to this program are that it doesn't slow
the cpu down (although a lot more output is produced), and it does let
one "tune" code for compactness by letting one view the results of the
coding directly. Also, incomplete instructions are handled as data
rather than overrunning into the next proc.
──────────────────────────────────────────────────────────────────────
June 6, 1991 Page 47
Inside TURBO Pascal Unit Files
──────────────────────────────────────────────────────────────────────
8.2 NOTES ON PROGRAM LOGIC
The following sections discuss a few of the methods employed by the
supplied program. There are no cutting-edge algorithms here. Results
counted for a lot more than technique.
8.2.1 FORMATTING THE DICTIONARY
Printing the unit dictionary area in a way that exposes its underlying
semantics is no small task. The unit dictionary area itself is a
rather amorphous-looking mass of data composed of hash tables, Name
Entries and stubs, type descriptors, etc. In order to present all
this information in a meaningful way, we have to reveal its structure
and this cannot be done by means of a sequential "browse" technique.
Rather, we have to visit all nodes in the dictionary area so that each
may be formatted in a way that exposes their function and meaning.
This is made necessary by the fact that items are added to the
dictionary as encountered and no convenient ordering of entry types
exists. What we have here is the problem of finding a minimal "cover"
for the dictionary area that properly exposes the content and
structure of the dictionary area.
To do this, we scan the dictionary recursively to determine the number
of structures that we need to map. Then we get heap storage for the
array of records that will hold the mapping information and repeat our
recursive dictionary scan, this time constructing the mapping records.
The recursive algorithm is "delicate" in that it is vulnerable to the
cycles that our analysis uncovers - particularly when polymorphic
objects are involved. Therefore, we have incorporated a simple little
trap that tries to discover such cycles and avoid them. It is
possible that the algorithm could fail for exceedingly complex units
but it handles the worst cases from Borland with ease. Prior versions
of this unit accomplished this task without recursion but required too
many tricky pointer manipulations that were environmentally sensitive,
so recursion was adopted. Since unit dictionaries don't tend to be
deeply nested, we get reasonable heap utilization coupled with stable
algorithms.
The result is an array containing one entry for each structure in the
unit dictionary area that is identifiable via traversal. Each entry
in the array contains information about nesting level, parent scope,
structure type and location. The array thus forms a set of
descriptors that drive the process of formatting the dictionary area
for display. The process may be likened to "painting by the numbers"
or to finding a way to lay tile on a flat surface using tiles of
differing shapes until the floor is exactly covered.
──────────────────────────────────────────────────────────────────────
June 6, 1991 Page 48
Inside TURBO Pascal Unit Files
──────────────────────────────────────────────────────────────────────
There is one significant limitation that needs to be pointed out. It
is not always possible to determine the "parent" or "owner" of a node
with certainty. The following discussion illustrates the problem of
finding the "real" parent of a Type Descriptor.
Almost every "type" in Turbo Pascal is actually derived from the basic
types that are defined in the SYSTEM.TPU unit -- e.g. "INTEGER",
"BYTE", etc. In addition, several of the Type Descriptors in the
SYSTEM unit are referenced by more than one Name Entry. Thus, we find
that a "many-to-one" relationship may exist between Name Entries and
Type Descriptors. How does one find out which is the entry that
actually gave rise to the Type Descriptor?
The Dictionary Area of a unit has some special properties, one of
which is the fact that the Name Entries for named Types are often
located quite near their primary type descriptors. The Dictionary
Area seems to be treated as an upward growing heap with the various
structures being added by Turbo as encountered. This makes it likely
that the Type "Q" header which gives rise to a type descriptor is
quite likely to occur earlier in the Dictionary Area than any other
entry which refers to the same descriptor. We use this property to
allocate "ownership" but it may not be "fool-proof". Some type
descriptors are spawned by other type descriptors, especially for
structured types. Further, structured named types are often
accompanied by pointer types and this results in having multiple named
types sharing the same type descriptor. We don't attempt to allocate
"ownership" to "spawned" type descriptors but we do try to keep track
of scope information.
A useful by-product of the above process is the ability to discover
many of the associations between Global Variables, Typed CONST's,
VMT's and the blocks in which they are declared or defined.
8.2.2 THE DISASSEMBLER
To start with, I apologize up front for mistakes which are bound to be
present in this routine. I am not really a MASM or TASM programmer
and I will not pretend otherwise. This being the case, the formatting
I have chosen for the operands may be erroneous or misleading and
might (if submitted to one of the "real" assemblers) produce object
code quite different from what is expected. I hope not, but I have to
admit it's possible.
──────────────────────────────────────────────────────────────────────
June 6, 1991 Page 49
Inside TURBO Pascal Unit Files
──────────────────────────────────────────────────────────────────────
My intention in adding this unit was to support hand-tuning of object
code. With practice and some effort, one can observe the effect on
the object module caused by specific Pascal coding. Thus, where
compactness or speed is an issue of paramount importance, disassembly
can be of help. In some cases, a simple re-arrangement of the local
variable declarations in a procedure can have a significant effect on
the size of the code if it means the difference between 1 and 2-byte
displacements for each instruction that references a specific local
variable. Potential applications along these lines seem almost
unlimited.
I adopted an operand format not unlike that of TASM "Ideal" mode since
it was more convenient to do so and looked more readable to me. I
relied on several reference books for guidance in decoding the entire
mess and I found that there were several flaws (read ERRORS) in some
of them which made the job that much more difficult. I then
compounded my problems by attempting to handle 80386 specific code
even though Turbo Pascal does not yet generate code specific to these
processors. I simply felt that the effort involved in writing any
sort of Dis-Assembly program for Turbo Pascal units was an effort best
experienced not more than once. With all this self-flagellation out
of my system once and for all, I will try to show the basic strategy
of the program and to explain the limitations and some of the
discoveries I made.
The routine is intended to be idiotically simple - i.e., no smarter
than the DEBUG command in principle. The basic idea is: pass some
text to the routine and get back ONE line derived from some prefix of
that text. Repeat as necessary until all text is gone. Thus, there
is no attempt to check the context of the text being processed. Also,
some configurations of the "modR/M" byte may invalid for selected
instructions. I don't try to screen these out since the intent was to
look at the presumably correct code produced by TURBO Pascal -- not
devious assembly language. Also, this program regards WAIT operations
as "stand-alone" -- i.e., it doesn't check to see if a coprocessor
operation follows for which the WAIT might be regarded as a prefix.
One area of real difficulty was figuring out the Floating-Point
emulations used by Turbo Pascal Version 6.0 for DOS that are
implemented by means of interrupts $34 through $3D. I don't know if I
got it right, but the results seem reasonable and consistent. In the
listing, the Interrupt is produced on one line, followed by its
parameters on the next line. The parameter line is given the op-code
"EMU_xxxx" where "xxxx" is the coprocessor op-code I felt was being
emulated. Interrupt $3C was a real puzzler but after seeing a lot of
code in context, I think that the segment override is communicated to
the emulator by means of the first byte after the $3C.
──────────────────────────────────────────────────────────────────────
June 6, 1991 Page 50
Inside TURBO Pascal Unit Files
──────────────────────────────────────────────────────────────────────
Normally, in a non-emulator environment, all coprocessor operations
(ignoring any WAIT prefixes) begin with $D8-$DF. What Borland (and
maybe Microsoft) seem to have done here is to change the $D8-$DF so
that bits 7 and 6 of this byte are replaced with the one's complement
of the 2-bit segment register number found in various 8086
instructions. This seems to be how an override for the DS register is
passed to the emulator. I don't KNOW this to be the correct
interpretation, but the code I have examined in context seems to work
under this scheme, so the disassembler uses it to interpret the
operand accordingly.
For 80x86 machines, the problem was somewhat simpler. The
disassembler takes a quick look at the first byte of the text. Almost
any byte is valid as the initial byte of an instruction, but some
instructions require more than one byte to hold the complete operation
code. Thus, step 1 classifies bytes in several ways that lead to
efficient recognition of valid operation codes.
Once the instruction has been identified in this way, it is more or
less easy to link to supplemental information that provides operand
editing guidance, etc.
The tables that embody the recognition scheme were constructed using
PARADOX (another fine Borland product) and suitably coded queries were
used to generate the actual Turbo Pascal code for compilation.
For those that are interested, the disassembler supports the address-
size and operand-size prefixes of the 80386 as well as 32-bit operands
and addresses but remember that Turbo Pascal doesn't generate these.
A trivial change is provided for which allows segments which default
to 32-bit mode to be handled as well.
There is a simple mode variable that gets passed to the disassembler
by its caller which specifies the most-capable processor whose code is
to be handled. Codes are provided for the 8086 (8088 is the same),
80186 (same as 80286 without protected mode instructions), 80286
(80186 plus protected mode), and 80386. You now get asked which one
to use.
No such specifier is provided for coprocessor support. What is there
is what I think an 80387 supports. I don't think that this is really
a problem if you don't try to use this disassembler for anything but
Turbo Pascal code.
Error recovery is predictably simple. The initial text byte is output
as the operand of a DB pseudo-op and provision is made to resume work
at the next byte of text.
I hope this program is found to be useful in spite of the errors it
must surely contain. I have yet to make much sense of the rules for
MASM or TASM operand coding and I found very little of value in many
of the so-called "texts" on the subject. I found myself in the
position of that legendary American in England watching a Cricket
match for the first time ("You mean it has RULES?").
──────────────────────────────────────────────────────────────────────
June 6, 1991 Page 51
Inside TURBO Pascal Unit Files
──────────────────────────────────────────────────────────────────────
9. UNIT LIBRARIES
I have examined .TPL files in and conclude that their structure is
trivial. It's so easy to handle them that the program now routinely
examines either the TURBO.TPL or the TPW.TPL to resolve named types.
9.1 LIBRARY STRUCTURE
A Turbo Pascal Library (.TPL) file is a simple catenation of Turbo
Pascal Unit (.TPU) files. Since the size of a Unit may be determined
from the Unit Header (see Section 4.2, Page 16), it is simple to see
that one may "browse" through a .TPL file looking for an external unit
such as SYSTEM.TPU. The supplied program does just that in its unit
retrieval process so the TPUMOVER utility is no longer required for
processing of units in either the TURBO.TPL or in the TPW.TPL file.
──────────────────────────────────────────────────────────────────────
June 6, 1991 Page 52
Inside TURBO Pascal Unit Files
──────────────────────────────────────────────────────────────────────
10. INFERENCES DRAWN FROM ANALYSES
I have learned much about Turbo Pascal .EXE files from poring over the
output of the supplied program. It is possible to learn how to build
smaller .EXE files after contemplating the structure of Unit files.
It is also possible to avoid certain troublesome anomalies in the code
if one can see just what Turbo Pascal does when certain switch
declaratives are in effect.
10.1 LINKER GRANULARITY
The Linker appears to be able to resolve any code or data fragment
with a resolution that matches the granularity of the various "map"
tables in the unit file. The Code Map, the CONST DSeg Map and the
GLOBAL VAR Map each map things that can be included in the .EXE file
if referenced. Conversely, these things can also be excluded if not
referenced. Turbo Pascal manuals have been just a little vague about
how "smart" the "Smart Linker" actually is but the granularity of the
maps implies the extent of that "smartness". Assuming the linker does
in fact take advantage of this information and act on it, then we as
programmers can have a bit more control over the elements included
from Unit Files. This control can extend to GLOBAL VAR's that may be
used in particular circumstances, or not at all in others.
It seems that CONST DSeg and GLOBAL VAR Map entries are constructed
for each TYPED CONST or VAR "Declaration Part" encountered in the
Pascal source code. Thus, "Toolbox" type units can have their Typed
CONST's and GLOBAL VAR's partitioned along usage lines dedicated to a
small group of Procedures or Functions so that they only get included
if the appropriate Procedures or Functions are referenced or are
explicitly referenced by the some external program.
10.2 FLOATING-POINT EMULATION
Floating-Point emulation has some tricky cases -- particularly when
the In-Line Assembler is used. As noted earlier, the implementation
of Floating-Point Emulation is the responsibility of the SYSTEM unit
in the MS-DOS version and of WIN87EM in the WINDOWS version. The
state of the {$G±} directive toggle has an impact in these cases.
──────────────────────────────────────────────────────────────────────
June 6, 1991 Page 53
Inside TURBO Pascal Unit Files
──────────────────────────────────────────────────────────────────────
It would appear that 80286 code generation changes the way that
floating-point instructions are generated since the 80287 is implied
as the co-processor chip. In this case, the programmer has fine
control over the timing of WAIT instructions since 80287 instructions
don't automatically get prefixed by WAIT ops. When 8087 code is being
generated, these WAIT instructions are produced for 8087 instructions
since the 8087 requires it. This doesn't happen when the code is
targeted at the 80287. So far, so good. However, EMULATION of such
code gets trickier.
10.2.1 VERSION 6.0 COMPILER FOR MS-DOS
It seems that the {$E±} directive doesn't work like it did in previous
versions. All code produced in 8087 mode seems to be emulated code.
I haven't found a way to get 8087 code generated if the compiler runs
on a machine that doesn't have a co-processor. It may be that the
directive works as documented if a co-processor is available on the
machine the compiler runs on.
10.2.2 VERSION 1.0 COMPILER FOR WINDOWS
It seems that the WIN87EM DLL in WINDOWS either needs to be able to
service 80287 code via Hardware Interrupt 07H, or the application
needs to be able to adapt itself to missing co-processor situations.
This is implied by the Emulation Fix-Ups discussed earlier. These
fix-ups are produced when 8087 code is being generated since the WAIT
prefix on an instruction provides space for loader patching. Since
WAIT prefixes are not automatically produced for 80287 instructions
(except for FWAIT), some other mechanism is needed. I don't know how
this situation is handled unless WIN87EM also services Interrupt 07H.
──────────────────────────────────────────────────────────────────────
June 6, 1991 Page 54
Inside TURBO Pascal Unit Files
──────────────────────────────────────────────────────────────────────
11. APPLICATION NOTES
One of the more obvious applications of this information would seem to
be in the area of a Cross-Reference Generator.
There is a very fine example of such a program in the public domain
that was written by Mr. R. N. Wisan called "PXL". This program has
been around since the days of Turbo Pascal Version 1. The program has
been continually enhanced by the author in the way of features and for
support of the newer Turbo Pascal versions. It does not however solve
the problem of telling one which unit contains the definition of a
given symbol. In fairness to "PXL" however, this is no small problem
since the format of .TPU files keeps changing (Turbo 6.0 Units are
not object-code compatible with Turbo 5.x Units, and so on...) and
Mr. Wisan probably has more than enough other projects to keep himself
occupied.
However, for the user who is willing to work a little (maybe a lot?),
this document would seem to provide the information needed to add such
a function to his own pet cross-reference generator.
Further, with SIGNIFICANTLY more effort, it should be possible to do
much of the job of de-compilation -- provided the DEBUG dictionary is
present. At the very least, most declarations should be recoverable.
It's another thing entirely to try to reconstruct plausable TURBO
Pascal code from the CSegs. This would be a formidable task and lots
of knowledge about TURBO's code generators would have to be acquired.
At present, the only way I know to get this information is to have the
run-time library source codes and then work-work-work at testing code
produced by the compiler for a huge number of test case units. You
have to want to do this really badly in order to invest the time. I
am not that tired of living.
Finally, code-tuning is not really so tedious an exercise as one might
imagine. The disassembler makes it possible to experiment with many
variants of specific source code at the unit level and to observe the
effect on object code generated. With practice, there are certain
coding practices one can avoid such as indescriminate use of the
"WITH" statement in Pascal (generates extra pointers and stack usage).
A really simple way of checking a code proposal is to create a small
test unit and fill it with sample coding. Disassembly of that unit
will show what code is produced. This can be a rewarding exercise!
──────────────────────────────────────────────────────────────────────
June 6, 1991 Page 55
Inside TURBO Pascal Unit Files
──────────────────────────────────────────────────────────────────────
12. ACKNOWLEDGEMENTS
This project would have been totally infeasible without the aid of
some very fine tools. As it was, several hundred man hours have been
expended on it and as you can see, there are a few unresolved issues
that have been (graciously) left for others to address. The tools
used by this author consisted of:
Turbo Pascal for Windows by Borland International
Turbo Pascal 6.0 Professional by Borland International
Microsoft WORD (version 5.5)
LIST (version 7.5) by Vernon D. Buerg
the DEBUG utility in MS-DOS Version 3.3.
PARADOX 3.5 by Borland International
QUATTRO PRO Version 2.0 by Borland International
TURBO ASSEMBLER 2.0 by Borland International
(PARADOX and QUATTRO PRO were used for data collection and analysis in
the course of coding the recognizer tables for the disassembler unit.)
The references listed were of great value in this project. [Intel85]
was a valuable source of information about coprocessor instructions as
well as offering hints about the differences between the 8086/8088 and
the 80286. The [Borland] TASM manuals offered further info on the
80186. [Nelson] provided presentations of well-organized data
directed at the problem of disassembly but the tables were flawed by a
number of errors which crept into my databases and which caused much
of the extra debugging effort. [Intel89] offered valuable insights on
the 80386 addressing schemes as well as the 32-bit data extensions.
Finally, [Brown] provided valuable clues on the Floating-Point
emulators used by Borland (and Microsoft?). As you can see, the
amount of hard information available to me on this project was quite
limited since I am unaware of any other existing body of literature on
this subject.
Finally, I am grateful to Mr. Anders Hejlsberg (Borland's Principal
Architect for TURBO PASCAL) for the time he spent discussing "cabbages
and kings" with me. TURBO PASCAL owes much of its syntactic style and
elegance to his efforts and good judgement.
──────────────────────────────────────────────────────────────────────
June 6, 1991 Page 56
Inside TURBO Pascal Unit Files
──────────────────────────────────────────────────────────────────────
13. REFERENCES
[Borland], TURBO PASCAL FOR WINDOWS Programmer's Guide, Borland
International, 1991.
[Borland], TURBO ASSEMBLER REFERENCE GUIDE, Borland International,
1988.
[Borland], TURBO ASSEMBLER USER'S GUIDE, Borland International, 1988.
[Borland] TURBO PASCAL 6.0 PROGRAMMING GUIDE, Borland International,
1990.
[Borland] TURBO PASCAL LIBRARY REFERENCE Version 6.0, Borland
International, 1990.
[Borland] TURBO PASCAL USER'S GUIDE Version 6.0, Borland
International, 1990.
[Brown], INTER191.ARC, Ralf Brown, 1991
[Intel85], iAPX 286 PROGRAMMER'S REFERENCE MANUAL INCLUDING THE iAPX
286 NUMERIC SUPPLEMENT, Intel Corporation, 1985, (order
number 210498-003).
[Intel89], 386 SX MICROPROCESSOR PROGRAMMER'S REFERENCE MANUAL, Intel
Corporation, 1989, (order number 240331-001).
[Nelson] THE 80386 BOOK: ASSEMBLY LANGUAGE PROGRAMMER'S GUIDE FOR
THE 80386, Ross P. Nelson, Microsoft Press, 1988.
[Scanlon], 80286 ASSEMBLY LANGUAGE ON MS-DOS COMPUTERS, Leo J.
Scanlon, Brady 1986.
──────────────────────────────────────────────────────────────────────
June 6, 1991 Page 57
14. INDEX
.OBJ file........14, 35, 37, 39 Hash.............13, 14, 15, 16,
.RES file........39 17, 18, 19, 20,
.TPL file........8, 16, 45, 46, 25, 30, 31, 48
52
.TPU Include..........39, 40
file...........7, 9, 13, 16, Interface........7, 13, 14, 15,
27, 45, 46, 52, 16, 17, 18, 19,
55 26
size.........16 Interrupt 07H....54
SYSTEM.........8, 18, 19, 21,
27, 49, 52 Library..........45
Locator
{$E±}............54 LG.............9, 12, 21, 23,
{$G±}............53 26, 27, 30, 31,
32, 33, 34
80286............54 LL.............9, 13, 18, 26,
80287............44, 54 35
80387............44 offset.........9, 11, 12, 22,
8087.............42, 44, 54 24, 25, 31, 35,
36, 40, 41, 42
Attribute
ABSOLUTE.......9 Method...........24
EXTERNAL.......24, 35 CONSTRUCTOR....24
DESTRUCTOR.....24
Call Model Self...........22
ASSEMBLER......24
Dynamic........24 Operand offset...42
FAR............24
INLINE.........24 Parameter........20, 23, 25, 34
INTERRUPT......24 PROC.............7, 13, 14, 24,
CONST............7, 13, 14, 15, 35, 36, 40, 42,
22, 31, 37, 40, 47
41, 42, 47
Constraint.......33, 34 RunError.........47
CSeg.............7, 13, 14, 35,
36, 37, 39, 40, SEGMENT..........42
41, 42, 47 Signature........7, 26
Stub.............9, 20, 23
Defining block...37, 38 sSxx...........24
Directive........14, 15, 16, 24, SYSTEM.TPS.......8, 19
35, 39, 40
DLL..............7, 13, 38, 42 TPW..............45, 52
DMT..............14, 15, 24, 31, TURBO............45, 52
37 Type Descriptor..21, 23, 26, 27,
28, 30, 31, 32,
Emulation........53 33, 34, 49
Emulator.........42, 43
External.........9, 35, 37, 42, VAR..............38, 47
52 VMT..............14, 15, 25, 31,
37
FWAIT............54
WIN87EM..........42, 44, 53, 54
Granularity......53